Predictive Test for Melanoma Patient Benefit from Interleukin-2 (IL2) Therapy

ABSTRACT

A method is disclosed for predicting in advance whether a melanoma patient is likely to benefit from high dose IL2 therapy in treatment of the cancer. The method makes use of mass spectrometry data obtained from a blood-based sample of the patient and a computer configured as a classifier and making use of a reference set of mass spectral data obtained from a development set of blood-based samples from other melanoma patients. A variety of classifiers for making this prediction are disclosed, including a classifier developed from a set of blood-based samples obtained from melanoma patients treated with high dose IL2 as well as melanoma patients treated with an anti-PD-1 immunotherapy drug. The classifiers developed from anti-PD-1 and IL2 patient sample cohorts can also be used in combination to guide treatment of a melanoma patient.

PRIORITY

This application claims priority benefits to U.S. provisionalapplication Ser. No. 62/289,587 filed Feb. 1, 2016, and U.S. provisionalapplication Ser. No. 62/369,289 filed Aug. 1, 2016. The content of eachof the above-referenced applications is incorporated by referenceherein.

FIELD

This invention relates to a method for predicting in advance oftreatment whether a melanoma patient is likely to benefit fromadministration of high dose IL2 therapy in treatment of the cancer.

BACKGROUND

Interleukin-2 (IL2) is a cytokine signaling molecule in the immunesystem. It is a protein that regulates the activities of white bloodcells (leukocytes, often lymphocytes) that are responsible for immunity.There are different dosages of IL2 across the United States and acrossthe world being used to treat patients. The efficiency and side effectsof different dosages is often a point of disagreement. Usually, in theU.S., the higher dosage option is used, depending on the cancer,response to treatment, and general health of the patient. Patients aretypically given the high dosages for five consecutive days, three timesa day, for fifteen minutes. The patient is given approximately 10 daysto recover between treatment dosages. IL2 is delivered intravenously forthis type of dosing, and administration at hospital is generallyrequired to enable proper monitoring of side effects.

High dose IL2 therapy has been approved for the treatment of renal cellcarcinoma and melanoma. It is the only immunotherapy that offers thechance of a cure—a lasting complete response—to around 10% of patients.Both in metastatic renal cell carcinoma (R Fisher, S Rosenberg, G Fyfe,Long-term survival update for high-dose recombinant interleukin-2 inpatients with renal cell carcinoma. Cancer J Sci Am. 2000 February; 6Suppl 1:S55-7) and in metastatic melanoma (M Atkins, M Lotze, J Dutcher,et al., High-dose recombinant interleukin 2 therapy for patients withmetastatic melanoma: analysis of 270 patients treated between 1985 and1993. J Clin Oncol 1999 July; 17(7):2105-16), a proportion of patientsexperience durable compete responses with little or no long termtoxicity from treatment. However, high dose IL2 therapy requireshospitalization for 1-2 weeks during each of (usually) two treatmentcourses. Close monitoring by an experienced medical team is requiredduring this period due to the likelihood of severe side effects fromcapillary leak syndrome. These are short-term, however, with patientsrecovering to pre-treatment status within about 3 days of the end of IL2administration (see, e.g., A Amin and R White Jr, High-doseinterleukin-2: is it still indicated for melanoma and RCC in an era oftargeted therapies. Oncology (Williston Park). 2013 July; 27(7):680-91).

There have been efforts to find pre-treatment tests or biomarkers ableto predict which patients will experience these durable responses fromIL2 therapy (see, e.g. M Sabatino, S Kim-Schulze, M Panelli, et al.,Serum Vascular Endothelial Growth Factor and Fibronectin PredictClinical Response to High-Dose Interleukin-2 Therapy, J Clin Oncol. 2009Jun. 2; 27(16) 2645-2651), but, as yet, none have passed adequatevalidation. The “SELECT” trial, for example, designed to assess theability of a test integrating IHC staining for carbonic anhydrase-9 withhistological sub-classification to predict response to IL2 therapy fortreatment of patients with metastatic renal cell carcinoma (D McDermott,S Cheng, S Signoretti, et al., The High-Dose Aldesleukin “Select” Trial:A Trial to Prospectively Validate Predictive Models of Response toTreatment in Patients with Metastatic Renal Cell Carcinoma. Clin CancerRes. 2015 Feb. 1; 21(3):561-8.) did not validate this test as useful forpredicting response. While it may be possible to identify smallproportions of patients (around 10% or less), based on non-clear cellhistology or baseline clinical and pathological characteristics (e.g.,University of California Los Angeles Survival After Nephrectomy andImmunotherapy Score—UCLA SANI Score, ibid) who will not respond to IL2therapy, little progress has been made in providing a clinically usefultest for patient selection for this treatment. Some earlierobservations, however, may be of interest. In particular, acute responseproteins or regulators of acute response may serve as important dynamicmarkers of pre-treatment prognosis and predictors of response in thecourse of treatment. It was shown that non-responders have highpre-treatment levels of C-reactive protein (CRP) and interleukin 6(IL-6). In contrast, patients with good responses, have significantlylower levels of these proteins at baseline, and develop high circulatinglevels of IL-6 and CRP at different time intervals during the infusion(Broom J, Heys S D, Whiting P H, Park K G, Strachan A, Rothnie I, FranksC R, Eremin O. Interleukin 2 therapy in cancer: identification ofresponders. Br J Cancer. 1992 December; 66(6):1185-7; Deehan D J, Heys SD, Simpson W G, Broom J, Franks C, Eremin O. In vivo cytokine productionand recombinant interleukin 2 immunotherapy: an insight into thepossible mechanisms underlying clinical responses. Br J Cancer. 1994June; 69(6):1130-5.)

The lack of a test able to select patients for IL2 therapy has becomemore of a problem with the advent of new effective immunotherapyoptions, such as nivolumab and pembrolizumab and the combination ofipilimumab and nivolumab in melanoma and nivolumab in renal cellcarcinoma, all recently approved by the FDA. These checkpoint inhibitortherapies, while not producing the cures characteristics of IL2, doproduce extremely durable responses, at least able to turn cancer into achronic condition for some patients. Hence, there is now an urgent needfor tests to help physicians and patients choose between or sequence IL2and these other immunotherapeutic options.

SUMMARY

In a first aspect, a method is disclosed for predicting in advancewhether a melanoma patient is likely to benefit from high dose IL2therapy in treatment of the cancer. The method includes the steps of:

a) performing mass spectrometry on a blood-based sample of the patientand obtaining mass spectrometry data of the sample;

b) performing a classification of the mass spectrometry data with theaid of a computer implementing a classifier, wherein the classifier isdeveloped from a development set of samples from melanoma patientstreated with the high dose IL2 therapy and consists of a hierarchicalcombination of classifiers 1 and 2. Classifier 1 is developed from thedevelopment set of samples and a set of mass spectral featuresidentified as being associated with an acute response biologicalfunction and generates either an Early class label and a Late classlabel, or the equivalent. Classifier 2 is developed from a subset ofsamples in the development set which are classified as Late byclassifier 1. Classifier 2 also generates an Early class label and aLate class label or the equivalent. If the sample from the patient isclassified as Late by both classifier 1 and classifier 2, the patient ispredicted to have a greater likelihood of benefit from the high dose IL2therapy as compared to if the sample from the patient is classified asEarly by either classifier 1 or classifier 2.

In one embodiment classifier 1 and classifier 2 use the features forperforming classification of the sample recited in Table 33.

It is noted below that classifier 2 alone performs similarly to thehierarchical combination of classifiers 1 and 2. Accordingly, in anotheraspect a method for predicting in advance whether a melanoma patient islikely to benefit from high dose IL2 therapy in treatment of the canceris disclosed comprising the steps of: a) performing mass spectrometry ona blood-based sample of the patient and obtaining mass spectrometry dataof the sample; b) performing a classification of the mass spectrometrydata with the aid of a computer implementing a classifier 2, wherein theclassifier 2 is developed from a subset of a development set of samplesfrom melanoma patients treated with the high dose IL2 therapy which havebeen classified as Late or the equivalent by a classifier 1 using a setof mass spectral features identified as being associated with an acuteresponse biological function; wherein if the sample from the patient isclassified as Late or the equivalent by classifier 2 the patient ispredicted to have a greater likelihood of benefit from the high dose IL2therapy as compared to if the sample from the patient is classified byclassifier 2 as Early or the equivalent.

In other aspects, a computer configured as a classifier for predictingmelanoma patient benefit from high dose IL2 and a testing system forconducting the tests of this disclosure are also considered inventiveaspects.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A and 1B illustrate Kaplan-Meier plots for progression-freesurvival (PFS) (FIG. 1A) and overall survival (OS) (FIG. 1B) for thecohort of 114 patients with baseline samples and acquired spectra whichwere used to develop the classifiers of this disclosure.

FIG. 2 is a plot of the distribution of normalization scalars by diseasecontrol rate (DCR) groups.

FIG. 3 is a plot of the distribution of normalization scalars by overallsurvival groups.

FIG. 4 is portion of a mass spectrum showing definitions of features(peaks).

FIG. 5 is a plot of partial ion current normalization scalars by DCRgroups.

FIG. 6 is a plot of partial ion current normalization scalars by overallsurvival groups.

FIGS. 7A and 7B are a flow chart of a computer-implemented procedure forclassifier generation which is referred to as “Diagnostic Cortex.” Themethodology of FIG. 7A-7B was applied to the mass spectral data from thedevelopment sample set and resulted in a classifier (i.e., a set ofparameters stored in the computer) which makes predictions of melanomapatient benefit from high dose IL2 therapy in advance of treatment.

FIGS. 8A and 8B illustrate Kaplan-Meier plots of OS (FIG. 8A) and PFS(FIG. 8B) by classification groups Early and Late produced by theClassifier 1 described below.

FIGS. 9A and 9B illustrate Kaplan-Meier plots of OS (FIG. 9A) and PFS(FIG. 9B) by classification groups Early and Late produced by theClassifier 2 described below when classifying the 70 samples which wereclassified as Late by Classifier 1.

FIGS. 10A and 10B illustrate Kaplan-Meier plots of OS (FIG. 10A) and PFS(FIG. 10B) by classification groups Early and Late produced by acombination of Classifiers 1 and 2.

FIGS. 11A and 11B illustrate Kaplan-Meier plots of OS (FIG. 11A) and PFS(FIG. 11B) by classification groups Early and Late produced by theClassifier 2 described below when classifying all the 114 samples in thedevelopment set.

FIG. 12 illustrates a schema or hierarchical manner of defining a testusing Classifier 1 and Classifier 2.

FIG. 13 is a block diagram of a practical testing environment forconducting a test on a blood-based sample from a melanoma patient todetermine in advance of treatment whether they are likely to obtainrelatively greater benefit from high dose IL2 in treatment of thecancer.

FIGS. 14A-14F are a set of Kaplan-Meier plots of PFS and OS for the 112patients in the IL2 cohort showing classifications produced by the IL2classifier of this disclosure (FIGS. 14A-14B) as well as classificationsproduced by the Example 1 and Example 5 classifiers (“IS2” and “IS6”,respectively) of U.S. provisional application Ser. No. 62/289,587 filedFeb. 1, 2016, see FIGS. 14C-14F.

FIGS. 15A-15B show Kaplan-Meier plots of PFS (FIG. 15A) and OS (FIG.15B) by combination of the IS2 and IL2 classifiers.

FIGS. 16A-16B show Kaplan-Meier plots of PFS (FIG. 16A) and OS (FIG.16B) by combination of the IS6 and IL2 classifiers.

FIGS. 17A-17F are a set of Kaplan-Meier plots of TTP and OS for the 119patients in a sample set referred to as “Moffitt Cohort”, namely a setof melanoma patients treated with nivolumab, with classifications ofthis set of samples produced by the IL2 classifier (FIG. 17A-17B), theIS2 classifier (FIGS. 17C-17D) and the IS6 classifier (FIGS. 17E-17F).

FIG. 18A-18B are Kaplan-Meier plots of TTP and OS (FIGS. 18A and 18B,respectively) by combination of the IS2 and IL2 classifiers.

FIGS. 19A and 19B are Kaplan-Meier plots of TTP and OS (FIGS. 19A and19B, respectively) by combination of the IS6 and IL2 classifiers.

FIG. 20 is plot of PFS of a cohort of patients used for validation ofthe IL2 classifier.

FIG. 21 is a Kaplan-Meier plot of PFS by IL2 test classification for thevalidation cohort.

DETAILED DESCRIPTION

This document will initially describe a set of blood-based samplesobtained from a population of melanoma patients in advance of treatmentand the generation and processing of mass spectral data which is usedfor classifier development. Later, in the context of FIG. 7, we describethe development of a computer-implemented classifier from this massspectral data which is able to predict whether a melanoma patient islikely to benefit from high dose IL2 in treatment of the cancer. Wefurther illustrate the results of the classifiers developed from theFIG. 7 procedure. A laboratory testing environment for conducting thetest is also described in conjunction with FIG. 13. We later describethe application of two classifiers from our U.S. provisional applicationSer. No. 62/289,587 filed Feb. 1, 2016 on the IL2 sample set as well asthe IL2 classifier performance on the 119 melanoma patient samples usedto develop the anti-PD-1 classifier of the U.S. provisional applicationSer. No. 62/289,587 filed Feb. 1, 2016.

Our method for obtaining data for use in classifier generation andmaking predictive tests uses matrix assisted laser desorption andionization time of flight (MALDI-TOF) mass spectrometry. Preferredembodiments use the so-called Deep MALDI methods described in U.S. Pat.No. 9,279,798, the content of which is incorporated by reference herein.

A. Samples, Mass Spectral Data Acquisition and Pre-Processing of Spectra

Patient Samples

One hundred and fourteen blood-based (serum) samples were available withgood quality mass spectra and associated clinical data. No baselineclinical data was available for this patient cohort. The samples wereacquired from melanoma patients pre-treatment with high dose IL2.

Kaplan-Meier plots for progression-free survival (PFS) and overallsurvival (OS) for the cohort of 114 patients with baseline samples andacquired spectra are shown in FIGS. 1A and 1B, respectively. Responsedata is summarized in table 1. Median OS is 813 days (95% CI: 623-1037days) and median PFS is 79 days (95% CI: 69-94 days).

TABLE 1 Response data for all 114 patients with available clinical dataand spectra from pretreatment samples n (%) CR 8 (7) PR 13 (11) MinimalResponse 6 (5) SD 26 (23) PD 60 (53) NE/NA 1 (1)All patients with complete response are still progression-free, withmedian follow up time of 1092 days (range 184-1547 days). Four of the 13partial responders are still progression-free, with median follow uptime of 1026 days (range 697-1435).

Sample Preparation

Samples were thawed and 3 μl aliquots of each experimental sample (i.e.one of the samples from patients subsequently treated with IL2) andquality control serum (a pooled sample obtained from serum of fivehealthy patients, purchased from ProMedDx, “SerumP3”) spotted ontoVeriStrat® serum cards (Therapak). The cards were allowed to dry for 1hour at ambient temperature after which the whole serum spot was punchedout with a 6 mm skin biopsy punch (Acuderm). Each punch was placed in acentrifugal filter with 0.45 μm nylon membrane (VWR). One hundred μl ofHPLC grade water (JT Baker) was added to the centrifugal filtercontaining the punch. The punches were vortexed gently for 10 minutesthen spun down at 14,000 rcf for two minutes. The flow-through wasremoved and transferred back on to the punch for a second round ofextraction. For the second round of extraction, the punches werevortexed gently for three minutes then spun down at 14,000 rcf for twominutes. Twenty microliters of the filtrate from each sample was thentransferred to a 0.5 ml eppendorf tube for MALDI analysis.

All subsequent sample preparation steps were carried out in a customdesigned humidity and temperature control chamber (Coy Laboratory). Thetemperature was set to 30° C. and the relative humidity at 10%.

An equal volume of freshly prepared matrix (25 mg of sinapinic acid per1 ml of 50% acetonitrile: 50% water plus 0.1% TFA) was added to each 20μl serum extract and the mix vortexed for 30 sec. The first threealiquots (3×2 μl) of sample:matrix mix were discarded into the tube cap.Eight aliquots of 2 μl sample:matrix mix were then spotted onto astainless steel MALDI target plate (SimulTOF). The MALDI target wasallowed to dry in the chamber before placement in the MALDI massspectrometer.

This set of samples was processed for MALDI analysis in four batches. QCsamples were added to the beginning (two preparations) and end (twopreparations) of each batch run.

Spectral Acquisition

MALDI spectra were obtained using a MALDI-TOF mass spectrometer(SimulTOF 100 s/n: LinearBipolar 11.1024.01 from Virgin Instruments,Marlborough, Mass., USA). The instrument was set to operate in positiveion mode, with ions generated using a 349 nm, diode-pumped,frequency-tripled Nd:YLF laser operated at a laser repetition rate of0.5 kHz. External calibration was performed using the following peaks inthe QC serum spectra: m/z=3320 Da, 4158.7338 Da, 6636.7971 Da, 9429.302Da, 13890.4398 Da, 15877.5801 Da and 28093.951 Da.

Spectra from each MALDI spot were collected as 800 shot spectra thatwere ‘hardware averaged’ as the laser fires continuously across the spotwhile the stage is moving at a speed of 0.25 mm/sec. A minimum intensitythreshold of 0.01 V was used to discard any ‘flat line’ spectra. All 800shot spectra with intensity above this threshold were acquired withoutany further processing.

Spectral Processing

Raster Spectra Preprocessing

Alignment and Filtering

Each raster spectrum of 800 shots was processed through an alignmentworkflow to align prominent peaks to a set of 43 alignment points (seetable 2). A filter was applied that essentially smooths noise andspectra were background subtracted for peak identification. Given theidentified peaks, the filtered spectra (without background subtraction)were aligned. Additional filtering parameters required that rasterspectra have at least 20 peaks and used at least 5 alignments to beincluded in the pool of rasters used to assemble the average spectrum.

TABLE 2 Alignment points used to align the raster spectra M/z 3168.004153.48 4183.00 4792.00 5773.00 5802.00 6432.79 6631.06 7202.00 7563.007614.00 7934.00 8034.00 8206.35 8684.25 8812.00 8919.00 8994.00 9133.259310.00 9427.00 10739.00 10938.00 11527.06 12173.00 12572.38 12864.2413555.00 13762.87 13881.55 14039.60 14405.00 15127.49 15263.00 15869.0617253.06 18629.76 21065.65 23024.00 28090.00 28298.00

Raster Averaging

Averages were created from the pool of aligned and filtered rasterspectra. A random selection of 500 raster spectra was averaged to createa final analysis spectrum for each sample of 400,000 laser shots.

Deep MALDI Average Spectra Preprocessing

Background Estimation and Subtraction

The two window method of background estimation and subtraction was usedas it was discovered that this method better estimates the background inregions where small peaks are surrounded by much larger peaks. Table 3lists the windows that were used for estimation and subtraction ofbackground from the analysis spectra (averages).

TABLE 3 Background estimation windows m/z width Wide windows 3000 6000030000 60000 31000 100000 Medium windows 3000 7500 30000 7500 31000 10000

Normalization by Bin Method

The bin method was used to compare clinical groups of interest to ensurethat normalization windows are not selected that have desirablecharacteristics for distinguishing the groups of interest. Thenormalization windows were reduced using the reference replicatesspotted alongside the IL2 samples on each plate to remove features thatare intrinsically unstable. To do this, a CV cutoff of 0.2 was applied.Normalization windows with CVs greater than 0.2 were rejected fromconsideration. To further prune the normalization windows, diseasecontrol status (DCR) was used to compare features. A p value cutoff of0.5 was applied (features below 0.5 were rejected) and a CV cutoff of0.65 (features above 0.65 were rejected). As a final step, clinicalgroups defined as Early (with OS below the median OS) and Late (with OSabove the median OS) were compared. Features with P values below 0.5 andCVs greater than 0.80 were removed. The remaining features used asnormalization windows are listed below in table 4.

TABLE 4 Normalization by bin windows Left M/z Right M/z 3785.03 4078.744324.18 4390.64 4491.64 4688.07 4689.55 4742.28 4744.51 4799.83 4801.324874.47 4946.13 5077.58 5080.92 5259.89 6377.92 6510.48 7229.72 7513.778402.18 8498.62 9054.25 9171.92 9172.81 9271.02 9547.95 9811.60 10908.6111356.51 19212.92 20743.82The resulting normalization scalars were compared between the groups toensure the combination of windows was not significantly associated withgroups. The plots of FIGS. 2 and 3 demonstrate that the distribution ofnormalization scalars is not associated with the clinical groups ofinterest.

Average Spectra Alignment

The peak alignment of the average spectra is typically very good;however, a fine-tune alignment step was performed to address minordifferences in peak positions in the spectra. A set of alignment pointswas identified and applied to the analysis spectra (table 5).

TABLE 5 Calibration points used to align the spectral averages M/z3315.17 4153.33 4456.88 4709.91 5066.47 6432.85 6631.27 7934.36 8916.299423.10 9714.25 12868.19 13766.39 14044.69 14093.30 15131.43 15871.9316077.64 17255.58 17383.45 18630.93 21069.05 21168.45 28084.44 28292.8667150.37

Feature Definitions

Feature definitions (peaks in the spectra) were selected in an iterativeprocess over the batches. Several spectra were loaded simultaneously andfeatures defined. The entire M/z region of interest was examined and allfeatures were defined. After the first round, a second set of spectrawere examined. Some features were not optimally defined from the firstround and were adjusted to meet requirements of the second set ofspectra. New features were identified that were not present in the firstset of spectra. This process was continued until the final set wasdetermined. As a final step, each batch was examined to determine if anyadditional features could be defined that could only be identified withknowledge from many spectra loaded simultaneously. Several features wereidentified that may have heightened susceptibility to peptidemodifications that take place during the sample preparation procedure.These manifest in spectra as specific m/z regions where the peaks changein intensity and shape dependent on the position on the plate where thesample was spotted. These regions were excluded from feature selection.A final set of 326 feature definitions was applied to the spectra and islisted in Table 32. An example of features defined using the describedmethod is displayed in FIG. 4 with the SP3 reference spectra and spectrafrom batch 1 indicated.

Batch Correction of Analysis Spectra

SerumP3 analysis

Two preparations of the reference sample, SerumP3, were plated at thebeginning (1,2) and end (3,4) of each run. The purpose of these samplesis to ensure that variations by batch due to slight changes ininstrument performance (for example, aging of the detector) can becorrected for.

To perform batch correction, one spectrum, which is an average of one ofthe preparations from the beginning and one from the end of the batch,must serve as the reference for the batch. The procedure used forselecting the pair is described first.

The reference samples were preprocessed as described above. All 326features were used to evaluate the possible combinations (1-3, 1-4, 2-3,2-4). We compared each possible combination of replicates using thefunction:

A=min(abs(1−ftrval1/ftrval2),abs(1−ftrval2/ftrval1))

where ftrval1 (ftrval2) is the value of a feature for the first (second)replicate of the replicate pair. This quantity A gives a measure of howsimilar the replicates of the pair are. For each feature, A is reported.If the value is >0.5, then the feature is determined to be discordant,or ‘Bad’. A tally of the bad features is reported for each possiblecombination. If the value of A is <0.1, then the feature is determinedto be concordant and reported as ‘Good’. A tally of the Good features isreported for each possible combination. Using the tallies of Bad andGood features from each possible combination, we computed the ratio ofBad/Good. The combination with the lowest ratio was reported as the mostsimilar combination, unlikely to contain any systematic or localizedoutlier behavior in either of the reference spectra. If no ratio can befound that is less than 0.12, then the batch is declared a failure.Table 6 reports the combinations that were found most similar for eachbatch.

TABLE 6 SerumP3 preparations found to be most similar by batch BatchCombination IL1_B1 2_3 IL2_B2 2_4 IL2_B3 1_4 IL2_B4 2_3

Batch Correction

Batch 1 was used as the baseline batch to correct all other batches. Thereference sample was used to find the correction coefficients for eachof the batches 2-4 by the following procedure.

Within each batch j (2≤j≤4), the ratio

${\hat{r}}_{i}^{j} = \frac{A_{i}^{j}}{A_{i}^{1}}$

and the average amplitude Ā_(i) ^(j)=½(A_(i) ^(j)+A_(i) ¹) are definedfor each i^(th) feature centered at (m/z)_(i), where A_(i) ^(j) is theaverage reference spectra amplitude of feature i in the batch beingcorrected and A_(i) ¹ is the reference spectra amplitude of feature i inbatch 1 (the reference standard). It is assumed that the ratio ofamplitudes between two batches follows the dependence

r(Ā,(m/z))=(a ₀ +a ₁ ln(Ā))+(b ₀+(Ā))(m/z)+c ₀(m/z)².

On a batch to batch basis, a continuous fit is constructed by minimizingthe sum of the square residuals, Δ^(j)=Σ_(i)({circumflex over (r)}_(i)^(j)−r^(j)(a₀,a₁,b₀,b₁,c₀))², and using the experimental data of thereference sample. The SerumP3 reference samples are used to calculatethe correction function. Steps were taken to not include outlier pointsin order to avoid bias in the parameter estimates. The values of thecoefficients a₀, a₁, b₀, b₁ and c₀, obtained for the different batchesare listed in Appendix B (table B.1) of prior provisional applicationSer. No. 62/369,289 filed Aug. 1, 2016. The projection in the{circumflex over (r)}_(i) ^(j) versus (m/z)_(i) plane of the points usedto construct the fit for each batch of reference spectra, together withthe surface defined by the fit itself, can be plotted but the detailsare not particularly important and omitted for the sake of brevity.

Once the final fit, r^(j)(Ā,(m/z)), is determined for each batch, thenext step is to correct, for all the samples, all the features (withamplitude A at (m/z)) according to

$A_{corr} = {\frac{A}{r^{j}\left( {\overset{\_}{A},\left( {m/z} \right)} \right)}.}$

After this correction, the corrected, (Ā_(i) ^(j),(m/z)_(i),{circumflexover (r)}_(i) ^(j)) feature values calculated for reference spectra liearound the horizontal line defined by =1. Post correction coefficientsare calculated to compare to quality control thresholds. Thesecoefficients can be found in Appendix B table B.2 of prior provisionalapplication Ser. No. 62/369,289 filed Aug. 1, 2016.

Partial Ion Current (PIC) Normalization

The dataset was combined (batches 1-4) and examined to find regions ofintrinsic stability to use as the final normalization windows. First,the univariate p values were found by comparing the DCR groups acrossall features. Features with p values less than 0.15 were excluded fromthe PIC analysis as these features may contribute meaningful informationin test development. In a second screen, p values comparing OS groups(Early and Late) were computed. Again features with p values less than0.15 were excluded from the PIC analysis. A set of 222 features wereused in the PIC analysis, of which 21 were used for the final PICnormalization (Table 7). Further details on partial ion currentnormalization of mass spectra is found in U.S. Pat. No. 7,736,905, thecontent of which is incorporated by reference herein.

TABLE 7 Features used for PIC normalization M/z 3681 3776 3952 4010 45906081 6194 6921 6947 6971 7021 7035 7053 13845 14051 14100 21066 2117321272 21373

To normalize, the feature values from the listed features were summedfor each spectrum to compute a normalization scalar. All feature valueswere then divided by the normalization scalar to arrive at the finaltable used in the diagnostic cortex. The normalization scalars wereagain examined by clinical group to test that the combined features,i.e. the scalars themselves, were not correlated with group. The plotsof FIGS. 5 and 6 illustrate the distribution of the scalars by group.The plots for the two groups are very similar, indicating that thenormalization scalars are appropriate to use.

Once the final features have been defined and the spectra subject to theabove preprocessing routines (including background subtraction), featurevalues are obtained for each of the features listed in Table 32 for eachof the samples in the development set. This is the “feature table” inthe following discussion.

B. Classifier Development (FIG. 7)

After the feature table for features in the mass spectra for the 114samples was created (as explained above) we proceeded to develop withthe aid of a programmed computer a classifier using the classifiergeneration method shown in flow-chart form in FIGS. 7A-7B. This method,known as “combination of mini-classifiers with drop-out regularization”or “CMC/D”, or DIAGNOSTIC CORTEX™, is described at length in U.S. Pat.No. 9,477,906 of H. Röder et al., the entire content of which isincorporated by reference herein. An overview and rationale of themethodology will be provided here first, and then illustrated in detailin conjunction with FIG. 7 for the generation of the melanoma/IL2classifier.

In contrast to standard applications of machine learning focusing ondeveloping classifiers when large training data sets are available, thebig data challenge, in bio-life-sciences the problem setting isdifferent. Here we have the problem that the number (n) of availablesamples, arising typically from clinical studies, is often limited, andthe number of attributes (measurements) (p) per sample usually exceedsthe number of samples. Rather than obtaining information from manyinstances, in these deep data problems one attempts to gain informationfrom a deep description of individual instances. The present methodstake advantage of this insight, and are particularly useful, as here, inproblems where p>>n.

The method includes a first step a) of obtaining measurement data forclassification from a multitude of samples, i.e., measurement datareflecting some physical property or characteristic of the samples. Thedata for each of the samples consists of a multitude of feature values,and a class label. In this example, the data takes the form of massspectrometry data, in the form of feature values (integrated peakintensity values at a multitude of M/z ranges or peaks, see Table 32) aswell as a label indicating some attribute of the sample (for example,patient Early or Late death or disease progression). In this example,the class labels were assigned by a human operator to each of thesamples after investigation of the clinical data associated with thesample. The development sample set is then split into a training set anda test set and the training set is used in the following steps b), c)and d).

The method continues with a step b) of constructing a multitude ofindividual mini-classifiers using sets of feature values from thesamples up to a pre-selected feature set size s (s=integer 1 . . . n).For example a multiple of individual mini- or atomic classifiers couldbe constructed using a single feature (s=1), or pairs of features (s=2),or three of the features (s=3), or even higher order combinationscontaining more than 3 features. The selection of a value of s willnormally be small enough to allow the code implementing the method torun in a reasonable amount of time, but could be larger in somecircumstances or where longer code run-times are acceptable. Theselection of a value of s also may be dictated by the number ofmeasurement data values (p) in the data set, and where p is in thehundreds, thousands or even tens of thousands, s will typically be 1, or2 or possibly 3, depending on the computing resources available. Themini-classifiers execute a supervised learning classification algorithm,such as k-nearest neighbors (kNN), in which the values for a features,pairs or triplets of features of a sample instance are compared to thevalues of the same feature or features in a training set and the nearestneighbors (e.g., k=9) in an s-dimensional feature space are identifiedand by majority vote a class label is assigned to the sample instancefor each mini-classifier. In practice, there may be thousands of suchmini-classifiers depending on the number of features which are used forclassification.

The method continues with a filtering step c), namely testing theperformance, for example the accuracy, of each of the individualmini-classifiers to correctly classify the sample, or measuring theindividual mini-classifier performance by some other metric (e.g. thedifference between the Hazard Ratios (HRs) obtained between groupsdefined by the classifications of the individual mini-classifier for thetraining set samples) and retaining only those mini-classifiers whoseclassification accuracy, predictive power, or other performance metric,exceeds a pre-defined threshold is within pre-set limits to arrive at afiltered (pruned) set of mini-classifiers. The class label resultingfrom the classification operation may be compared with the class labelfor the sample known in advance if the chosen performance metric formini-classifier filtering is classification accuracy. However, otherperformance metrics may be used and evaluated using the class labelsresulting from the classification operation. Only those mini-classifiersthat perform reasonably well under the chosen performance metric forclassification are maintained. Alternative supervised classificationalgorithms could be used, such as linear discriminants, decision trees,probabilistic classification methods, margin-based classifiers likesupport vector machines, and any other classification method that trainsa classifier from a set of labeled training data.

To overcome the problem of being biased by some univariate featureselection method depending on subset bias, we take a large proportion ofall possible features as candidates for mini-classifiers. We thenconstruct all possible kNN classifiers using feature sets up to apre-selected size (parameter s). This gives us many “mini-classifiers”:e.g. if we start with 100 features for each sample (p=100), we would get4950 “mini-classifiers” from all different possible combinations ofpairs of these features (s=2), 161,700 mini-classifiers using allpossible combination of three features (s=3), and so forth. Othermethods of exploring the space of possible mini-classifiers and featuresdefining them are of course possible and could be used in place of thishierarchical approach. Of course, many of these “mini-classifiers” willhave poor performance, and hence in the filtering step c) we only usethose “mini-classifiers” that pass predefined criteria. These filteringcriteria are chosen dependent on the particular problem: If one has atwo-class classification problem, one would select only thosemini-classifiers whose classification accuracy exceeds a pre-definedthreshold, i.e., are predictive to some reasonable degree. Even withthis filtering of “mini-classifiers” we end up with many thousands of“mini-classifier” candidates with performance spanning the whole rangefrom borderline to decent to excellent performance.

The method continues with step d) of generating a master classifier (MC)by combining the filtered mini-classifiers using a regularizedcombination method. In one embodiment, this regularized combinationmethod takes the form of repeatedly conducting a logistic training ofthe filtered set of mini-classifiers to the class labels for thesamples. This is done by randomly selecting a small fraction of thefiltered mini-classifiers as a result of carrying out an extreme dropoutfrom the filtered set of mini-classifiers (a technique referred to asdrop-out regularization herein), and conducting logistical training onsuch selected mini-classifiers. While similar in spirit to standardclassifier combination methods (see e.g. S. Tulyakov et al., Review ofClassifier Combination Methods, Studies in Computational Intelligence,Volume 90, 2008, pp. 361-386), we have the particular problem that some“mini-classifiers” could be artificially perfect just by random chance,and hence would dominate the combinations. To avoid this overfitting toparticular dominating “mini-classifiers”, we generate many logistictraining steps by randomly selecting only a small fraction of the“mini-classifiers” for each of these logistic training steps. This is aregularization of the problem in the spirit of dropout as used in deeplearning theory. In this case, where we have many mini-classifiers and asmall training set we use extreme dropout, where in excess of 99% offiltered mini-classifiers are dropped out in each iteration.

In more detail, the result of each mini-classifier is one of two values,either “Early” or “Late” in this example. We can then combine theresults of the mini-classifiers in the spirit of a logistic regressionby defining the probability of obtaining an “Early” label via standardlogistic regression (see e.g.http://en.wikipedia.org/wiki/Logistic_regression)

$\begin{matrix}{{P\left( {``{Early}"} \middle| {{feature}\mspace{14mu} {for}\mspace{14mu} a\mspace{14mu} {spectrum}} \right)} = \frac{\exp\left( {\sum\limits_{{mini}\mspace{14mu} {classifiers}}\; {w_{mc}{I\left( {{mc}\left( {{feature}\mspace{14mu} {values}} \right)} \right)}}} \right)}{Normalization}} & {{Eq}.\mspace{14mu} (1)}\end{matrix}$

where I(mc(feature values))=1, if the mini-classifier mc applied to thefeature values of a sample returns “Early”, and 0 if the mini-classifierreturns “Late”. The weights win, for the mini-classifiers are unknownand need to be determined from a regression fit of the above formula forall samples in the training set using +1 for the left hand side of theformula for the Late-labeled samples in the training set, and 0 for theEarly-labeled samples, respectively. As we have many moremini-classifiers, and therefore weights, than samples, typicallythousands of mini-classifiers and only tens of samples, such a fit willalways lead to nearly perfect classification, and can easily bedominated by a mini-classifier that, possibly by random chance, fits theparticular problem very well. We do not want our final test to bedominated by a single special mini-classifier which only performs wellon this particular set and is unable to generalize well. Hence wedesigned a method to regularize such behavior: Instead of one overallregression to fit all the weights for all mini-classifiers to thetraining data at the same time, we use only a few of themini-classifiers for a regression, but repeat this process many times ingenerating the master classifier. For example we randomly pick three ofthe mini-classifiers, perform a regression for their three weights, pickanother set of three mini-classifiers, and determine their weights, andrepeat this process many times, generating many random picks, i.e.realizations of three mini-classifiers. The final weights defining themaster classifier are then the averages of the weights over all suchrealizations. The number of realizations should be large enough thateach mini-classifier is very likely to be picked at least once duringthe entire process. This approach is similar in spirit to “drop-out”regularization, a method used in the deep learning community to addnoise to neural network training to avoid being trapped in local minimaof the objective function.

Other methods for performing the regularized combination method in step(d) that could be used include:

-   -   Logistic regression with a penalty function like ridge        regression (based on Tikhonov regularization, Tikhonov, Andrey        Nikolayevich (1943). “        ” [On the stability of inverse problems]. Doklady Akademii Nauk        SSSR 39 (5): 195-198.)    -   The Lasso method (Tibshirani, R. (1996). Regression shrinkage        and selection via the lasso. J. Royal. Statist. Soc B., Vol. 58,        No. 1, pages 267-288).    -   Neural networks regularized by drop-out (Nitish Shrivastava,        “Improving Neural Networks with Dropout”, Master's Thesis,        Graduate Department of Computer Science, University of Toronto),        available from the website of the University of Toronto Computer        Science department.    -   General regularized neural networks (Girosi F. et al, Neural        Computation, (7), 219 (1995)).

The above-cited publications are incorporated by reference herein. Ourapproach of using drop-out regularization has shown promise in avoidingover-fitting, and increasing the likelihood of generating generalizabletests, i.e. tests that can be validated in independent sample sets.

“Regularization” is a term known in the art of machine learning andstatistics which generally refers to the addition of supplementaryinformation or constraints to an underdetermined system to allowselection of one of the multiplicity of possible solutions of theunderdetermined system as the unique solution of extended system.Depending on the nature of the additional information or constraintapplied to “regularize” the problem (i.e. specify which one or subset ofthe many possible solutions of the unregularized problem should betaken), such methods can be used to select solutions with particulardesired properties (e.g. those using fewest input parameters orfeatures) or, in the present context of classifier training from adevelopment sample set, to help avoid overfitting and associated lack ofgeneralization (i.e., selection of a particular solution to a problemthat performs very well on training data but only performs very poorlyor not all on other datasets). See e.g.,https://en.wikipedia.org/wiki/Regularization_(mathematics). One exampleis repeatedly conducting extreme dropout of the filteredmini-classifiers with logistic regression training to classificationgroup labels. However, as noted above, other regularization methods areconsidered equivalent. Indeed it has been shown analytically thatdropout regularization of logistic regression training can be cast, atleast approximately, as L2 (Tikhonov) regularization with a complex,sample set dependent regularization strength parameter λ (S Wager, SWang, and P Liang, Dropout Training as Adaptive Regularization, Advancesin Neural Information Processing Systems 25, pages 351-359, 2013 and DHelmbold and P Long, On the Inductive Bias of Dropout, JMLR,16:3403-3454, 2015) In the term “regularized combination method” the“combination” simply refers to the fact that the regularization isperformed over combinations of the mini-classifiers which passfiltering. Hence, the term “regularized combination method” is used tomean a regularization technique applied to combinations of the filteredset of mini-classifiers so as to avoid overfitting and domination by aparticular mini-classifier.

The performance of the master classifier is then evaluated by how wellit classifies the subset of samples forming the test set.

In step e), steps b)-d) are repeated in the programmed computer fordifferent realizations of the separation of the set of samples into testand training sets, thereby generating a plurality of master classifiers,one for each realization of the separation of the set of samples intotraining and test sets. The performance of the classifier is evaluatedfor all the realizations of the separation of the development set ofsamples into training and test sets. If there are some samples whichpersistently misclassify when in the test set, the process optionallyloops back and steps b), c) and d) and e) are repeated with flippedclass labels for such misclassified samples.

The method continues with step f) of defining a final classifier fromone or a combination of more than one of the plurality of masterclassifiers. In the present example, the final classifier is defined asa majority vote of all the master classifiers resulting from eachseparation of the sample set into training and test sets, oralternatively by an average probability cutoff.

Turning now to FIG. 7A-7B, the classifier development process will bedescribed in further detail in the context of the melanoma/IL2classifier. In FIG. 7A, the “development set” 100 is the set of 114samples we used for classifier development and the associated massspectrometry data. The samples were subject to deep MALDI and integratedintensity values of selected features (see 124) were calculated andstored as a feature table (see feature space 50).

Definition of Class Labels (102)

In our procedure of FIG. 7 we need to assign a class label to each ofthe samples in step 102. Time-to-event data was used for classifiertraining. In this situation class labels are not obvious and, as shownin FIG. 7A, the diagnostic cortex uses an iterative method to refineclass labels at the same time as creating/training the classifier. Aninitial guess is made for the class labels. Typically the samples aresorted on either PFS or OS and half of the samples with the lowesttime-to-event outcome are assigned the “Early” class label (early deathor progression, i.e. poor outcome) while the other half are assigned the“Late” class label (late death or progression, i.e. good outcome). Forthe classifiers discussed in this report both OS and PFS was used. Aclassifier is then constructed using the outcome data and these classlabels. This classifier can then be used to generate classifications forthe development set samples and these are then used as the new classlabels for a second iteration of the classifier construction step. Thisprocess is iterated until convergence. The group of samples with theclass label Early is shown at 104 and the group of samples with the Lateclass label is shown at 106. We therefore have a class labeleddevelopment set as shown at 107.

At step 108, we split the class labeled development set into trainingand test sets, in a random manner assigning one half of the samples intoa training set 112 and another half in a test set 110. In practice, many(e.g., hundreds) of separations of the development set into training andtest sets are identified so that the process can loop as indicated atloop 135 over each one of these different realizations.

Feature Deselection or Feature Selection (step 52)

To be able to consider all subsets of three or more features or toattempt to improve classifier performance by dropping noisy features(those not useful for classification) it may be necessary or desirableto deselect features that are not useful for classification from the setof 326 features. This is done at step 52. Removal or deselection offeatures likely to be of negligible use for classification is done usinga bagged feature deselection approach in which the ability of individualfeatures to classify samples (using kNN classification) is tested acrossmultiple randomly-drawn subsets of the development set and featuresdeselected that display no consistent univariate classificationpotential across the many subsets. This results in a reduced featurespace 122. Further details on feature deselection are set forth inAppendix C of our prior provisional application Ser. No. 62/369,289filed Aug. 1, 2016; see also pending U.S. patent application of J. Roderet al., Ser. No. 15/091,417 filed Apr. 5, 2016 published as US patentapplication publication 2016/0321561, and in U.S. provisionalapplication Ser. No. 62/319,958 filed Apr. 8, 2016, the content of whichis incorporated by reference herein. Feature selection based on thedevelopment sample set is prone to overfitting and always avoided.However, in one embodiment we do use a method where subsets of featuresare selected from the set of 326 available features based on theirassociation with particular biological functions as determined by a geneset enrichment analysis (GSEA) on a separate sample set. This isexplained in more detail in Appendix D of our prior provisionalapplication Ser. No. 62/369,289 filed Aug. 1, 2016. The methodology ofGSEA to identify mass spectral features with particular biologicalfunctions is also set forth in U.S. patent application Ser. No.15/207,825 filed Jul. 12, 2016 and in the articles V K Mootha, C MLindgren, K-F Eriksson, et al., PGC-1α-responsive genes involved inoxidative phosphorylation are coordinately downregulated in humandiabetes. Nat Genet. 2003; 34(3):267-73 and A Subramanian, P Tamayo, V KMootha, et al., Gene set enrichment analysis: A knowledge-based approachfor interpreting genome-wide expression profiles. Proc Natl Acad Sci USA2005; 102(43): 15545-50. A further description is therefore omitted forthe sake of brevity.

Table 33 lists reduced sets of features which were used for classifiertraining for Classifier 1 and Classifier 2 in the following discussion.

Creation and Filtering of Mini-Classifiers (Steps 120 and 126)

The development set samples 107 are split into training and test sets(110, 112) in multiple different random realizations, i.e., iterationsthrough the loop 135. Six hundred and twenty five realizations were usedfor this project. The procedure of FIG. 7A-7B works best when trainingclasses (Early and Late) have the same number of samples. Hence, ifclasses have different numbers of members, they are split in differentratios into test and training sets.

In step 120 many k-nearest neighbor (kNN) mini-classifiers (mCs) thatuse the training set as their reference set are constructed usingsubsets of features. All classifiers described in this report use k=9and use only mCs with single features (s=1) and pairs of features (s=2).

To target a final classifier that has certain performancecharacteristics, the mCs are filtered in step 126. This filtering isshown by the + and − signs 128 in FIG. 7A, with the + sign indicatingthat a particular mC passed filtering and a − sign indicating that a mCdid not pass filtering. The filtering was as follows. Each mC is appliedto its training set and performance metrics are calculated from theresulting classifications of the training set. Only mCs that satisfythresholds on these performance metrics pass filtering to be usedfurther in the process. The mCs that fail filtering are discarded. Allclassifiers presented in this document used filtering based on hazardratios. For hazard ratio filtering, the mC is applied to its trainingset. The hazard ratio for a specified outcome (PFS or OS) is thencalculated between the group classified as Early and the rest classifiedas Late. The hazard ratio must lie within specified bounds for the mC topass filtering.

Combination of Mini-Classifiers Using Logistic Regression with Dropout(Step 130, 132)

Once the filtering of the mCs is complete, the mCs are combined into onemaster classifier (MC) in step 130 using a logistic regression trainedon the training set class labels. To help avoid overfitting theregression is regularized using extreme drop out with only a smallnumber of the mCs chosen randomly for inclusion in each of the logisticregression iterations. The number of dropout iterations is selectedbased on the typical number of mCs passing filtering to ensure that eachmC is likely to be included within the drop out process multiple times.All classifiers presented in this report left in 10 randomly selectedmCs per drop out iteration and used 10,000 dropout iterations. Theresulting logistic regression weights for each mC over all of thedropout iterations were then averaged for definition of the masterclassifier.

We then evaluated the performance of the master classifier generated atstep 130 by using it to classify the members of the test set 110.

Training/Test Splits (Loop 135)

The use of multiple training/test splits in loop 135 avoids selection ofa single, particularly advantageous or difficult training set forclassifier creation and avoids bias in performance assessment fromtesting on a test set that could be especially easy or difficult toclassify.

At step 136 we optionally conduct an analysis of the data from each ofthe training/test set splits and get the performance characteristics forthe MCs and their classification results for each split at step 138.

At step 144 we determine whether any of the samples in the developmentset persistently misclassify when they are in the test set (110). If sowe flip the class label for such misclassified samples and via loop 146repeat the process beginning at step 102 and continuing through steps108, 120, 126 and 130 including looping over many different realizationsof the training and test set split (loop 135).

Definition of Final Test 150 (FIG. 7B)

The output of the logistic regression that defines each MC generated atstep 130 is a probability of being in one of the two training classes(Early or Late). Applying a threshold to this output produces a binarylabel (Early or Late) for each MC. For all classifiers presented in thisreport we used a cutoff threshold of 0.5. To select an overall finalclassification or test, a majority vote is done across all MCs(“ensemble average”). When classifying samples in the development setthis is modified to incorporate in the majority vote only MCs where thesample is not in the training set (“out-of-bag majority vote”).

For the definition of the final test, it is also possible to directlyaverage the MC probabilities to yield one average probability for asample. When working with the development set, this approach is adjustedto average over MCs for which a given sample is not included in thetraining set (“out-of-bag” estimate). These average probabilities canthen be converted into a binary classification by applying a cutoff.Applying a cutoff of 0.5 to the averaged probabilities gives verysimilar classifications to using a cutoff of 0.5 on the individual MCprobabilities and then performing the majority vote over the MCs. Thisapproach was not used to produce the results shown in this document,however.

As another alternative, a final test could be defined at step 150 bysimply selecting a MC for a particular training/test set split that hastypical performance.

In the procedure of FIG. 7A-7B, it is preferable to perform a validationof the master classifier defined at step 150 on an independent sampleset, as indicated at step 152. See the validation discussion below.

One embodiment of the melanoma/IL2 predictive test presented here uses acombination of two classifiers, Classifier 1 and Classifier 2, arrangedin a hierarchical manner, see FIG. 12. The parameters and developmentset used to generate these classifiers are shown in Table 8. It will beappreciated that the methodology of FIG. 7A-7B was performed twice—fortwo different development sets as indicated in the Table 8, and for twodifferent sets of mass spectrometry features, resulting in two differentfinal classifiers defined at step 150. The two classifiers are used in ahierarchical manner as will be explained subsequently.

TABLE 8 Parameters used for classifier development Classifier 1Classifier 2 Development Set All 114 samples and 70 samples andassociated associated spectra spectra classified as “Late” by Classifier1 k (for kNN classifiers) 9 9 # drop out iterations 10,000 10,000 # ofmCs included in each drop out 10 10 iteration Features included indevelopment 22 (See Table 33) All 326 (Table 32) with bagged featuredeselection Filtering criterion HR for OS HR for PFS

Results

The IL2 test uses a hierarchy of two classifiers. The first classifier(Classifier 1) uses GSEA Acute Response features, i.e. the featureselection in FIG. 7A resulting in the feature space 50 was based onselection of a subset of all measured mass spectral features that wereassociated with acute response with GSEA p value of 0.05 or less. (TheGSEA method and its use to find subgroups of mass spectral featuresassociated with certain biological functions are described in Appendix Dof our prior provisional application 62/369,289 filed Aug. 1, 2016 andthe patent and technical literature cited previously). The applicationof GSEA to correlate mass spectral features with particular biologicalfunctions is also described at length in U.S. application Ser. No.15/207,825 filed Jul. 12, 2016, and the relevant description of theprocedure in that document is incorporated by reference herein.) Thefeatures are listed in Table 33. The classifier was developed on thewhole set of 114 patients and gave classification labels of “Early” and“Late”. The performance was assessed using Kaplan-Meier plots of OS andPFS between samples classified as Early and Late, together withcorresponding hazard ratios (HRs) and log-rank p values. The results aresummarized in tables 9-11 and the Kaplan-Meier plots of FIG. 8.

TABLE 9 Response characteristics by classification groups Early N = 44Late N = 70 n (%) n (%) CR  0 (0)   8 (11) PR  7 (16) 6 (9) SD  9 (20)17 (24) PD 26 (59) 34 (49) N/A 1 (2) 0 (0) Minimal Response 1 (2) 5 (7)

TABLE 10 Medians for time-to-event endpoints by classification groupMedian OS (95% Cl) in days Median PFS (95% Cl) in days Early  596(340-721) 68 (49-86) Late 1105 (752-undefined) 93 (72-149)

TABLE 11 Survival analysis statistics between classification groups OSPFS log-rank p CPH p HR (95% CI) log-rank p CPH p HR (95% CI) Early vsLate 0.001 0.001 2.38 (1.44-3.94) 0.009 0.009 1.71 (1.14-2.57)The Kaplan-Meier plots of overall survival (OS) and progression freesurvival (PFS) by early and late classification groups are shown in FIG.8A-8B, respectively. FIGS. 8A-8B clearly shows that the samples in thedevelopment set classified as Early have relatively worse OS and PFS ascompared to the samples classified as Late.

Second Classifier “Classifier 2”

The second classifier uses all features of Table 32 and then a baggedfeature deselection step 52 (FIG. 7A, described in the patent literaturecited previously, including US patent application publication2016/0321561). The resulting list of features after feature deselectionis listed in Table 33. The classifier was developed according to theprocedure of FIGS. 7A-7B using as a development set 100 only the 70samples that classified as Late with the Classifier 1. This classifierclassified these 70 samples again as Early or Late. The results aresummarized in tables 12-14 and in the Kaplan-Meier plots of FIGS. 9A-9B.

TABLE 12 Response characteristics by classification groups Early N = 31Late N = 39 n (%) n (%) CR 0 (0)  8 (21) PR  3 (10)  3 (8)  SD  8 (26) 9 (23) PD 18 (58) 16 (41) N/A 0 (0) 0 (0) Minimal Response 2 (6) 3 (8)

TABLE 13 Medians for time-to-event endpoints by classification groupMedian OS (95% CI) in days Median PFS (95% CI) in days Early 1094(597-undefined)  84 (50-127) Late 1193 (685-undefined) 147 (77-331)

TABLE 14 Survival analysis statistics between classification groups OSPFS log-rank p CPH p HR (95% CI) log-rank p CPH p HR (95% CI) Early vsLate 0.738 0.739 1.13 (0.56-2.29) 0.012 0.014 1.95 (1.15-3.30)

The Kaplan-Meier plots of FIGS. 9A-9B of OS and PFS by classificationgroup show that the Early and Late groups have similar OS, but in thePFS plot those samples which classify as Early have relatively worse PFSas compared to the samples which classify as Late.

Hierarchical Combination of Classifier 1 and 2 (FIG. 12).

The combined classifier uses the classification “Early” from the firstclassifier and then both “Early” and “Late” classification labels fromthe second classifier. If an Early label is generated by eitherClassifier 1 or Classifier 2, the Early label is reported. If the Latelabel is generated by Classifier 2, the Late label is reported, as perFIG. 12. In Tables 15-17 and in FIG. 10, Early and Late are the reportedclass labels as just described. The performance was assessed usingKaplan-Meier plots of OS and PFS between samples classified as Early andLate, together with corresponding hazard ratios (HRs) and log-rank pvalues. The results are summarized in tables 15-17 and FIGS. 10A-10B.

TABLE 15 Response characteristics by classification groups Early N = 75Late N = 39 n (%) n (%) CR  0 (0)   8 (21) PR 10 (13)  3 (8)  SD 17 (23) 9 (23) PD 44 (59) 16 (41) N/A  1 (1)   0 (0)  Minimal Response  3 (4)  3 (8) 

TABLE 16 Medians for time-to-event endpoints by classification groupMedian OS (95% CI) in days Median PFS (95% CI) in days Early 653(557-852)  70 (57-86) Late 1193 (685-undefined) 147 (77-331)

TABLE 17 Survival analysis statistics between classification groups OSPFS log-rank p CPH p HR (95% CI) log-rank p CPH p HR (95% CI) Early vsLate 0.036 0.038 1.81 (1.03-3.16) 0.001 0.001 2.12 (1.36-3.30)

FIGS. 10A-10B illustrates the Kaplan-Meier plots of OS and PFS,respectively, by classification group for a classifier configured as ahierarchical combination of classifiers 1 and 2 as depicted in FIG. 12.

Note that the Early samples have significantly worse OS and PFS ascompared to the samples classified as Late.

Similar results can be obtained by applying Classifier 2 to all samplesin the cohort and using just the classification produced by Classifier2, rather than the stacked or hierarchical approach of FIG. 12. Thisproduces only three changes in label for samples that were classified byClassifier 1 as Early but classified by Classifier 2 as Late (one withprogressive disease (PD), one with stable disease (SD), one with partialresponse (PR)). The results obtained from applying Classifier 2 alone toall 114 samples are shown in the Kaplan-Meier plots of FIGS. 11A-11B andtables 18-20.

TABLE 18 Response characteristics by classification groups Early N = 72Late N = 42 n (%) n (%) CR  0 (0)  8 (19) PR  9 (13)  4 (10) SD 16 (22)10 (23) PD 43 (60) 17 (40) N/A  1 (1)  0 (0) Minimal Response  3 (4)  3(7)

TABLE 19 Medians for time-to-event endpoints by classification groupMedian OS (95% CI) in days Median PFS (95% CI) in days Early 647(537-852)  68 (56-86) Late 1193 (752-undefined) 147 (77-202)

TABLE 20 Survival analysis statistics between classification groups OSPFS log-rank p CPH p HR (95% CI) log-rank p CPH p HR (95% CI) Early vsLate 0.020 0.023 1.90 (1.09-3.28) 0.001 0.002 2.00 (1.30-3.08)FIGS. 11A-11B shows the Kaplan-Meier plots of OS and PFS byclassification group produced by the Classifier 2 on the whole set of114 samples. Again note the clear separation in the Early and Lategroups, with the Early groups having relatively worse OS and PFS ascompared to the Late group.

Hence, in view of the above, there are several embodiments of practicalclassifiers and tests for melanoma patient benefit from IL2, namelyeither the hierarchical combination of Classifiers 1 and 2 as per FIG.12, or just Classifier 2 alone.

Table 34 lists the class labels assigned to the 114 samples in thedevelopment sample set by the combination of Classifier 1 and Classifier2 as per FIG. 12.

Reproducibility

To assess test reproducibility Classifier 1 and Classifier 2 were run ontwo sets of spectra generated from 119 samples collected from patientswith advanced melanoma. The two sets of spectra were produced usingindependent sample preparation and spectral acquisition several weeksapart. Spectral acquisition and sample preparation procedures wereidentical to those described above

The results obtained for the spectra for Classifier 1 and Classifier 2were combined to produce an overall classification for each sample foreach run and the results compared to assess test reproducibility intable 21. Label concordance between the two runs was 97%.

TABLE 21 Test reproducibility (Combination of Classifier 1 andClassifier 2) First Run Early Late Second Run Early 80 2 Late 2 35

Biological Interpretation

Gene set enrichment analysis methods were used to examine theassociation of various biological processes with the testclassifications. Details of the method are given in the patent andtechnical literature cited previously, see also Appendix D of our priorprovisional application 62/369,289 filed Aug. 1, 2016, see also pages106-146 of pending U.S. application Ser. No. 15/207,825 filed Jul. 12,2016, and Appendix K of U.S. provisional application Ser. No. 62/289,587filed Feb. 1, 2016, the content of which is incorporated by referenceherein. Table 22 shows the univariate p values for the association ofthe biological processes with test classifications in an independentsample set of 49 samples for which matched deep MALDI spectra andprotein panel data were available. No corrections were made for multipletesting. Note that for these 49 samples the results of the combinationof Classifier 1 and Classifier 2 results and the results of simplytaking the classifications of Classifier 2 for all samples areidentical.

TABLE 22 p values from GSEA of association of test classifications withbiological processes Enrichment Score ProteinSetDescription(definition 1) p-value Acute inflammatory response 0.501 <0.001Activation of innate immune response 0.332 0.733 Regulation of adaptiveimmune response 0.324 0.596 Positive regulation of glycolytic process−0.651 0.059 Immune T-cells 0.181 0.904 Immune B-cells −0.362 0.329 Cellcycle regulation −0.268 0.379 Natural killer regulation −0.180 0.979Complement system 0.625 0.001 Cancer-experimental 0.718 0.575 Acuteresponse 0.512 0.126 Cytokine activity −0.254 0.534 Wound healing −0.4660.011 Interferon 0.170 0.959 Interleukin-10 0.212 0.558 Growth factorreceptor signaling −0.293 0.053 Immune response 0.278 0.065 ImmuneResponse Type 1 0.408 0.538 Immune Response Type 2 0.560 0.378 ImmuneResponse-Complement −0.176 0.847 Immune Response-Complement-Acute −0.2040.580 Response Acute phase 0.597 0.004 Hypoxia −0.262 0.551 Cancer−0.182 0.673 Cell adhesion 0.216 0.569 Mesenchymal transition −0.3250.704 Extracellular matrix-restricted source, −0.387 0.216 UNIPROTExtracellular matrix-from different sources −0.256 0.523 Angiogenesis−0.239 0.529Acute inflammatory response, complement system, acute phase, and woundhealing showed associations with test classifications at the p<0.05significance level.

It is possible to present running sum plots used in the GSEA and theproteins from the biological process protein sets in the leading edgesof these plots, using the methods described at pages 128-129 in ourprior patent application Ser. No. 15/207,825 filed Jul. 12, 2016 and thepaper of Subramanian et al. We created such plots of the running sum forthe four biological processes identified as having meaningfulassociations with test classifications, namely: acute response,complement system, wound healing and acute phase. Tables 35-38 show theproteins in the leading edges of the running sums for acute response,complement system, wound healing, and acute phase and their individualcorrelations with test classifications Early and Late in this work.

Conclusions

Using deep MALDI-TOF mass spectra obtained from pre-treatment serumsamples taken from patients receiving IL2 therapy for advanced melanomawe were able to use the Diagnostic Cortex (FIGS. 7A-7D) to create a testable to define a subset of patients (class label Late) where 21%experienced complete response, compared with only 7% in the unselectedcohort, i.e. a tripling of complete response rate. This enrichment ofvery good outcomes in the “Late” classification group of the overalltest based on a hierarchical combination of Classifier 1 and Classifier2 (i.e. “Late” classification produced from Classifier 1 and a “Late”classification produced from Classifier 2) greatly increases therelative risk/cost to benefit ratio of IL2 therapy and would help tomaintain it as or raise it to an attractive option for first-linetherapy for patients with advanced melanoma whose serum is classified as“Late”.

The test showed good reproducibility of 97% classification concordancein an independent cohort of 119 melanoma patients.

Gene set enrichment analyses showed that test classifications areassociated with the biological functions acute phase, acute inflammatoryresponse, complement system and wound healing. This is consistent withprevious observations that high pre-treatment levels of CRP and IL-6 areassociated with lack of response to IL2 therapy. In addition, some massspectral features used in Classifier 2 have been tentatively identifiedas proteins associated with acute phase response (m/z 23049—C reactiveprotein (CRP), m/z 11686—serum amyloid A).

Validation of IL-2 Classifier on Independent Sample Set

We conducted a validation exercise on the IL2 test described above tosamples collected prior to treatment of advanced melanoma patientstreated with IL2 with or without stereotactic body radiation therapy(SBRT).

The IL2 test was developed on 114 pretreatment serum samples from theIL2Select study in collaboration with Drs. Ryan Sullivan (MassachusettsGeneral Hospital Cancer Center) and David McDermott (Beth IsraelHospital). The goal of this development was to identify a patientsubpopulation enriched for high dose IL2 benefit, in particularcontaining most of the complete responders (CRs). The results of thiswork indicated that it is possible to find a group of patients, i.e.,those that serum has a Late label under the IL2 test, which containedall the CRs of subjects with available samples. The durable responserate at 1000 days follow-up in this group was 25%.

The purpose of this study was to evaluate the performance of this testin an independent blinded cohort. As the size of this set is small, andbecause there are differences in treatment through the addition ofradiation therapy this study is exploratory.

Patients and Samples

Samples were available for 37 patients. Baseline characteristics for thecohort are summarized in table 23.

TABLE 23 Baseline characteristics of the cohort Attribute Median (Range)Age  55 (20-76) Baseline LDH 219 (124-1984) Attribute n(%) Baseline LDH<ULN 27 (73) ≥ULN 10 (27) ECOG PS 0 25 (68) 1 11 (30) 2  1 (3) GenderFemale  9 (24) Male 28 (76) Prior SBRT Yes 19 (51) No 18 (49) PriorInterferon Yes  8 (22) No 29 (78) Any prior therapy Yes 22 (59) No 15(41) Any prior therapy Yes 11 (30) Except SBRT No 26 (70) PriorIpilimumab Yes  2 (5) No 35 (95)The Kaplan-Meier plot of progression-free survival (PFS) for the entirecohort is shown in FIG. 20 and best response is summarized in table 24.Note that PFS information was not available for two patients, and sothese were censored for PFS at day 1.

TABLE 24 Best response for the analysis cohort n (%) CR  6 (16) PR  9(24) SD  6 (16) PD 16 (43)

Results

Twenty one (57%) of the samples classified as IL2 test Early and theremaining 16 (43%) classified as IL2 test Late.

Best response is summarized by test classification in table 25. Five ofthe six complete responses are in the Late classification group.(Fisher's exact test p for CR vs no CR=0.066; Fisher's exact test p forresponse (CR+PR vs SD+PD)=0.107.

TABLE 25 Best response by test classification Early n(%) Late n(%) CR  1(5)  5 (31) PR 5 (24) (PFS events 4 (25) (PFS events at 208 days, 690 at369 days, 431 days, and 1274 days, censored at days, and 498 days,censored at 780 days and 1078 days) 1351 days) SD  4 (19) 2 (13) PD 11(52) 5 (31)

FIG. 21 is a Kaplan-Meier plot of PFS by IL2 test classification.Statistics are shown in Table 26.

TABLE 26 Progression-free survival analysis statistics Early (n = 21)Late (n = 16) Median PFS (95% CI) 265 (59-543) 498 (65-1300) HR (95% CI)0.54 (0.24-1.19) log-rank p value 0.121 Cox p value 0.127Patient characteristics are summarized by test classification in Table27.

TABLE 27 Baseline characteristics by test classification Attribute EarlyLate P value Age Median  57 (30-71)  52 (20-76) 0.688* (Range) BaselineLDH 263 210 0.082* (126-1984) (124-403) Attribute n(%) n(%) Baseline LDH<ULN  12 (57)  15 (94) 0.023 ≥ULN  9 (43)  1 (6) ECOG PS 0  11 (52)  14(88) 0.073** 1  9 (43)  2 (13) 2  1 (5)  0 (0) Gender Female  4 (19)  5(31) 0.458 Male  17 (81)  11 (69) Prior SBRT Yes  10 (48)   9 (56) 0.743No  11 (52)   7 (44) Prior Interferon Yes   2 (10)   6 (38) 0.055 No  19(90)  10 (63) Any prior therapy Yes  12 (57)  10 (63) >0.999 No  9 (43) 6 (38) Any prior except Yes  5 (24)  6 (38) 0.475 SBRT No  16 (76)  10(63) Prior Ipilimumab Yes  2 (10)  0 (0) 0.496 No  19 (90)  16 (100)*Mann-Whitney, **Chi-squared, all others-Fisher's exactWithin this cohort test classification is significantly associated withbaseline LDH (cutoff set to ULN=333 IU/L) and shows a trend toassociation with prior interferon treatment and performance status.

Table 28 shows the results of multivariate analysis of PFS, includingcovariates found to have at least a trend to association with testclassification.

TABLE 28 Multivariate analysis of PFS including test classification,performance status, baseline LDH level, and prior interferon treatmentHR (95% CI) p value Test classification (Early vs 0.64 (0.25-1.64) 0.354Late) ECOG PS (0 vs 1 or 2) 1.38 (0.60-3.17) 0.453 LDH (<ULN vs ≥ULN)2.68 (1.05-6.86) 0.040 Prior Interferon (No vs Yes) 1.74 (0.67-4.53)0.258The hazard ratio between Early and Late test classifications is somewhatincreased (i.e., somewhat smaller effect size) in multivariate analysis,with the main effect coming from the inclusion of LDH into the analysis,likely due to the fact that all but one patient classified as Late inthis small cohort had high baseline LDH. It should be noted that in thelarger cohort of patients from Moffitt Cancer Center used for thedevelopment of the immunotherapy test of our prior patent applicationSer. No. 15/207,825 filed Jul. 12, 2016, the majority of whom hadalready received at least one prior systemic treatment, many withipilimumab, baseline LDH was generally much higher and 78% of patientsclassified as IL2 test Late had LDH greater than ULN.

This validation exercise supports the following conclusions:

1. Application of the IL2 test to samples from the trial of high doseIL2 with or without SBRT produced 43% “Late” classifications, in linewith the proportion of 37% in samples used for the development of theIL2 test.

2. Five of the six patients with complete response we assigned a “Late”classification, raising the CR rate from 16% in the unselectedpopulation to 31% in the “Late” subgroup. Partial responses were splitbetween both classification groups and response rate was numerically,but not statistically significantly larger in the Late group (56%) thanin the Early group (29%), as may be expected given the size of thecohort.

3. The hazard ratio for PFS was somewhat larger than (HR=0.54), but notinconsistent with what had been found in the development cohort(HR=0.47).

4. Further validation of test performance, including investigation ofeffect size when adjusted for known prognostic factors, in largercohorts is required.

Within the limits of this small validation cohort, the performance ofthe IL2 test was consistent with results found in the developmentcohort. The test was able to enrich the proportion of completeresponders and overall responders from 16% and 41%, respectively, in theunselected population to 31% and 56%, respectively, in the goodprognosis subgroup.

C. Practical Testing System (FIG. 13)

Once the classifier or classifiers as described above have beendeveloped, their parameters and reference sets can now be stored andimplemented in a general purpose computer and used to generate a classlabel for a blood-based sample, e.g., in accordance with the testdescribed above. The class label can predict in advance whether amelanoma patient is likely to relatively benefit from high dose IL2therapy, i.e., where the classifier (or classifiers) produce the Lateclass label or the equivalent.

FIG. 13 is an illustration of a laboratory testing center or system forprocessing a test sample (in this example, a blood-based sample from amelanoma patient) using a classifier generated in accordance with FIG.7A-7B. The system includes a mass spectrometer 1506 and a generalpurpose computer 1510 having CPU 1512 implementing a classifier 1520coded as machine-readable instructions and a memory 1514 storing areference mass spectral data set including a feature table 1522 ofclass-labeled mass spectrometry data. This reference mass spectral dataset forming the feature table 1522 will be understood to be the massspectral data (integrated intensity values of predefined features, seeTable 33), associated with a development sample set to create theclassifier resulting from procedure FIG. 7A-7B, or some subset thereof,e.g., the subset of patients who test Late by classifier 1 where thetest uses classifier 2 alone. This data set could be from all thesamples, e.g., for classifier 1, or a subset of the samples (e.g., theset for classifier 2), or both. It will be appreciated that the massspectrometer 1506 and computer 1510 of FIG. 13 could be used to generatethe classifier 1520 in accordance with the process of FIG. 7A-7B.

The operation of the system of FIG. 13 will be described in the contextof conducting a predictive test for predicting melanoma patient benefitfrom high dose IL2 therapy. The following discussion assumes that theclassifier 1520 is already generated at the time of use of theclassifier to generate a class label (Early or Late, or the equivalent)for a test sample.

The system of FIG. 13 obtains a multitude of samples 1500, e.g.,blood-based samples (serum or plasma) from diverse cancer (e.g.,melanoma) patients and generates a class label for the samples as afee-for-service. The samples 1500 are used by the classifier 1520(implemented in the computer 1510) to make predictions as to whether thepatient providing a particular sample is likely or not likely to benefitfrom high dose IL2 therapy. The outcome of the test is a binary classlabel such as Early or Late or the like which is assigned to the patientblood-based sample. The particular moniker for the class label is notparticularly important and could be generic such as “class 1”, “class 2”or the like, but as noted earlier the class label is associated withsome clinical attribute relevant to the question being answered by theclassifier. As noted earlier, in the present context the Early classlabel is associated with a prediction of relatively less benefit (e.g.overall survival or progression free survival), and the Late class labelis associated with a prediction of relatively greater benefit (e.g.,relatively longer overall survival or progression free survival).

The samples may be obtained on serum cards or the like in which theblood-based sample is blotted onto a cellulose or other type card.Aliquots of the sample are spotted onto one or several spots of aMALDI-TOF sample “plate” 1502 and the plate inserted into a MALDI-TOFmass spectrometer 1506. The mass spectrometer 1506 acquires mass spectra1508 from each of the spots of the sample. The mass spectra arerepresented in digital form and supplied to a programmed general purposecomputer 1510. The computer 1510 includes a central processing unit 1512executing programmed instructions. The memory 1514 stores the datarepresenting the mass spectra 1508. Ideally, the sample preparation,spotting and mass spectrometry steps are the same as those used togenerate the classifier in accordance with FIG. 7A-7B.

The memory 1514 also stores a data set representing classifier 1520,which includes a) a reference mass spectral data set 1522 in the form ofa feature table of N class-labeled spectra, where N is some integernumber, in this example a development sample set of spectra used todevelop the classifier as explained above or some sub-set of thedevelopment sample set. The classifier 1520 includes b) code 1524representing a kNN classification algorithm (which is implemented in themini-classifiers as explained above), including the features and depthof the kNN algorithm (parameter s) and identification of all themini-classifiers passing filtering, c) program code 1526 for executingthe final classifier generated in accordance with FIG. 7 on the massspectra of patients, including logistic regression weights and datarepresenting master classifier(s) forming the final classifier,including probability cutoff parameter, mini-classifier parameters foreach mini-classifier that passed filtering, etc., and d) a datastructure 1528 for storing classification results, including a finalclass label for the test sample. The memory 1514 also stores programcode 1530 for implementing the processing shown at 1550, including code(not shown) for acquiring the mass spectral data from the massspectrometer in step 1552; a pre-processing routine 1532 forimplementing the background subtraction, normalization and alignmentstep 1554 (details explained above), filtering and averaging of the 800shot spectra at multiple locations per spot and over multiple MALDIspots to make a single 400,000+shot average spectrum (as explainedabove), a module (not shown) for calculating integrated intensity valuesat predefined M/z positions in the background subtracted, normalized andaligned spectrum (step 1556), and a code routine 1538 for implementingthe final classifier 1520 using the reference dataset feature table 1522on the values obtained at step 1556. The process 1558 produces a classlabel at step 1560. The module 1540 reports the class label as indicatedat 1560 (i.e., “Early” or “Late” or the equivalent).

The program code 1530 can include additional and optional modules, forexample a feature correction function code 1536 (described in U.S.patent application publication 2015/0102216) for correcting fluctuationsin performance of the mass spectrometer, a set of routines forprocessing the spectrum from a reference sample to define a featurecorrection function, a module storing feature dependent noisecharacteristics and generating noisy feature value realizations andclassifying such noisy feature value realizations, modules storingstatistical algorithms for obtaining statistical data on the performanceof the classifier on the noisy feature value realizations, or modules tocombine class labels defined from multiple individual replicate testingof a sample to produce a single class label for that sample. Still otheroptional software modules could be included as will be apparent topersons skilled in the art.

The system of FIG. 13 can be implemented as a laboratory test processingcenter obtaining a multitude of patient samples from oncologists,patients, clinics, etc., and generating a class label for the patientsamples as a fee-for-service. The mass spectrometer 1506 need not bephysically located at the laboratory test center but rather the computer1510 could obtain the data representing the mass spectra of the testsample over a computer network.

D. Other Classifiers Developed from Melanoma Patient Samples Treatedwith Antibody Drugs Targeting the Programmed Cell Death 1 (PD-1)Checkpoint Protein.

We have developed classifiers for predicting melanoma patient benefitfrom anti-PD-1 drugs including nivolumab. See U.S. provisionalapplication Ser. No. 62/289,587 filed Feb. 1, 2016, the content of whichis incorporated by reference herein, and U.S. application Ser. No.15/207,825 filed Jul. 12, 2016. Example 1 of the '587 application andthe '825 application describes a classifier, referred to herein as“IS2”, which was developed from a cohort of 119 blood-based samples frommelanoma patients in advance of treatment with nivolumab. The classifierwas developed using the same procedure of FIG. 7A-7B. Mini-classifierfiltering was performed on simple overall survival. The M/z featureswhich were used for IS2 classifier generation are listed in Appendix Aof the prior provisional application Ser. No. 62/289,587 and in the '825application filed Jul. 12, 2016. The classifier was able to split thedevelopment set into Early and Late groups with the Late groups havingimproved OS and PFS on nivolumab. The Kaplan-Meier plots and statisticsfor the classifier performance are set forth in Example 1 of applicationSer. No. 62/289,587 and Example 1 of the '825 application filed Jul. 12,2016.

We also described in our prior provisional Ser. No. 62/289,587 filedFeb. 1, 2016, at pages 113-119 thereof, Example 5, the development of anensemble of seven different classifiers, each of which are constructedfrom different subsets of the 119 melanoma/nivolumab patients sampleswith different proportions of patients with small and large tumors. Thisensemble of classifiers is referred to herein as “IS6”. The descriptionof the ensemble of classifiers and how it was generated (using theprocedure of FIG. 7A-7B for seven different development sets) isincorporated by reference herein.

Briefly, in the “IS6” classifier, the deep MALDI feature table for thepretreatment serum samples from melanoma patients treated with nivolumabat the Moffitt Cancer Center was used for classifier development. Forclassifier development, the 104 samples for the patients who had tumorsize follow up data were used. These 104 samples were split into twogroups according to baseline tumor size: the 50 patients with smallesttumors and the 54 patients with largest tumors. Each of these subsetswas used as the development set to develop a classifier using theprocess of FIG. 7A-7B, with bagged feature deselection and filtering ofmini-classifiers on overall survival.

In addition, five other subsets of the 104 sample classifier developmentset were defined as additional or alternative development sets. Thefirst of these took the set of 50 patients with smallest tumors, dropped10 of them, and replaced these with 10 patients from the set of 54 withthe larger tumors. The second of these took the set of 50 patients withsmallest tumors, dropped 20 of them, and replaced these with 20 patientsfrom the set of 54. Three other development sets were defined extendingthis approach further. The fifth classifier was accordingly a subset ofthe original 54 large tumor size set. In this way, 5 development sets of50 patient samples were generated that contained different proportionsof patients with smaller and larger tumor sizes (80%-20%, 60%-40%,40%-60%, 20%-80%, and 0%-100%, respectively). For each of these 5development sets, classifiers were generated using the same procedure ofFIG. 7A-7B described in detail above, i.e., each classifier was definedas a final classifier as an ensemble average over 625 master classifiersgenerated from 625 test/training splits of the development set used forthat classifier, and each master classifier is a logistic regressioncombination of a multitude of mini-classifiers that pass overallsurvival performance filtering criteria, and regularized by extreme dropout. Each classifier produces a binary class label for a sample, eitherEarly or Late, and Early and Late have the same clinical meaning asexplained in Example 1 of the prior provisional application 62/289,587.Hence, we obtained an ensemble of 7 different classifiers (the 5developed as described here, plus the “large” and “small” tumorclassifiers described in the “Classifiers incorporating tumor sizeinformation” section), each of which was developed on a clinicallydifferent classifier development set. It will be noted that the “large”tumor classifier described in the “Classifiers incorporating tumor sizeinformation” section and the fifth of the new classifiers generated from50 “large” tumor patients are similar, but distinct in that they wereformed from different sets of patients.

An alternative method for defining the classifier development sets withdifferent clinical groupings is as follows:

1. Order the 104 samples by tumor size.

2. Take the 50 samples with the smallest tumor size for one classifierdevelopment and the remaining 54 samples with the largest tumor foranother, just as here.

3. Define 5 other classifier development sets by

-   -   a. Dropping the 10 samples with the smallest tumor size and        taking the next 50 samples for a classifier development set.    -   b. Dropping the 20 samples with the smallest tumor size and        taking the next 50 samples for a second classifier development        set.    -   c. Dropping the 30 samples with the smallest tumor size and        taking the next 50 samples for a third classifier development        set.    -   d. Dropping the 40 samples with the smallest tumor size and        taking the next 50 samples for a forth classifier development        set.    -   e. Dropping the 50 samples with the smallest tumor size and        taking the next 50 samples for a fifth classifier development        set.

Classifiers are then developed from each of these seven classifierdevelopment sets using the procedure of FIG. 7A-7B. One then establishesrules to combine the classification results from these sevenclassifiers. This method of designing classifier development sets mayhave similar performance as the classifiers produced from thedevelopment sets described in the previous paragraphs, but may be morereproducible, for example in a rerunning of the samples.

To conduct a test on a patient's blood-based sample, the sample issubject to mass spectrometry as described above in the description ofFIG. 13. The resulting mass spectral data (integrated intensity valuesat the classification features used in the classifier developmentexercise) is then subject to classification by each of the 7 classifiersin the ensemble, using the general procedure of FIG. 13. Each of the 7classifiers generates a class label (Early/Late or similar). The set of7 class labels is used to define an overall classification for a testsample in accordance with a set of rules. In one particular example,samples where all classifiers in the ensemble return a good prognosis“Late” label are classified as “Good”, samples where all classifiersreturn a poor prognosis “Early” label are classified as “Bad”, and allother samples with mixed labels are classified as “Other”. Of course,other monikers for this ternary class label scheme could be used and theparticular choice of moniker is not particularly important. Other rulesfor combining the 7 labels could, of course, be used.

Thus, the IS6 classifier produces labels of Good, Bad and Otherdepending on how the sample is classified by the ensemble of seven tumorsize classifiers as explained in the provisional application Ser. No.62/289,587, Example 5.

In our prior provisional application 62/289,587 we described how theassociation with the complement system with the Early and Late classlabels lead to further insights regarding the Example 1 classifier(“IS2”, melanoma/nivolumab). In particular, the observed upregulation ofthe complement system proteins in the group classified as Early mayindicate that these patients have higher levels of immunosuppression,and/or higher levels of pro-tumor inflammation, related to theactivation of the corresponding immune checkpoints, and as a result areless responsive to such drugs as nivolumab, ipilimumab, pembrolizumab,or other agents targeting these pathways. Interestingly, it has beenshown that the complement protein C5a promotes the expression of thePD-1 ligands, PD-L1 and PD-L2. Zhang, J. Immunol. 2009; 182: 5123-5130.In this scenario one could envision that excessive complementupregulation might compete with efforts to inhibit PD-1. On the otherhand, the results of recent clinical trials suggest that patients withtumor microenvironment characterized by high expression of PDL1 andpresence of Tregs are more likely to respond to anti-PD1, anti-CTLA 4,or high dose IL-2 therapy. Though we do not know how exactlyupregulation of the complement system is correlated with Example 1classifications, this connection is in line with the biological effectsof the complement system discussed at pages 94-95 of our priorprovisional application Ser. No. 62/289,587. Consequently, we can expectthat Example 1 classifiers (IS2) may be relevant for the broad varietyof drugs affecting the immunological status of the patient, such asvarious immune checkpoint inhibitors and high dose IL2.

In this section of the document we describe an exercise of performing aclassification of the melanoma/IL2 sample set with the IS2 and IS6classifiers, which reveals that the IS2 and IS6 classifiers can beuseful alone and in conjunction with the IL2 classifier in guidingtreatment of melanoma patients.

Spectra from two of the patients in the 114 patient melanoma/IL2 cohortdescribed above failed quality control for IS2 and IS6 testing, leaving112 patients with matched IS2, IS6 and IL2 test classifications.Correspondence of the classifications in this cohort is summarized intable 29.

TABLE 29 IS2, IS6, and IL2 test classifications for the IL2 testdevelopment cohort IL2 Test Early (N = 73) IL2 Test Late (N = 39) IS2Early (N = 23) 23 0 Late (N = 89) 50 39 IS6 Bad (N = 10) 10 0 Other (N =49) 44 5 Good (N = 53) 19 34

All samples that are IL2 test Late are IS2 Late and all samples that areIS6 Bad are IL2 Early.

Table 30 shows the breakdown of best response categories for each testclassification. The partial responders are broken down by those with aPFS event prior to 1 year (8 patients), those with a PFS event after1000 days (1 patient) and those still censored for PFS (4 patients). (Nopatients with a partial response had an event between 1 year and 1000days.) All patients with a complete response remain progression-free.

TABLE 30 PR (PFS <1 yr, PFS > Minimal 1000 days, no PD SD Response PFSevent) CR NA IL2 Test Early 44 15 3 10 (7, 1, 2)  0 1 Late 16 9 3 3 (1,0, 2) 8 0 IS2 Early 16 3 0 4 (3, 1, 0) 0 0 Late 44 21 6 9 (5, 0, 4) 8 1IS6 Bad 7 0 0 3 (2, 1, 0) 0 0 Other 29 3 5 (4, 0, 1) 2 1 Good 24 15 3 5(2, 0, 3) 6 0

FIGS. 14A-14B shows the time-to-event outcomes by test classificationfor each test within the cohort of 112 patients with all 3classifications. In terms of the table of responses and PFS, the IL2test show clear superiority at identifying a group of patients who havea higher likelihood of a complete or long term durable response to IL2therapy. The superior stratification power of IS2 and IS6 for OSprediction is believed to be due to likely subsequent treatment withcheckpoint inhibitors in patients not showing a durable response to IL2.

Breaking down by combination of IS2 and IL2 test classifications givesthe Kaplan-Meier plots in FIG. 15A-15B. For PFS it is apparent that IS2classification adds no predictive power to IL2 Test classification: allIL2 Test Late patients are IS2 Late and the PFS for IL2 Test Earlypatients is similar regardless of IS2 classification. For OS, however,IL2 classification adds no predictive power to IS2 classification: IS2Late patients have similar OS regardless of whether they are IL2 TestEarly or Late. The combination of these two results is consistent withthe interpretation that IL2 Late patients perform well on IL2 therapyand the IS2 Late patients who are IL2 Early are able to make up for themarkedly lower PFS by good performance on subsequent therapy, such ase.g., checkpoint inhibitors.

Breaking down by combination of IS6 and IL2 test classifications givesthe Kaplan-Meier plots in FIGS. 16A-16B. The results bear a similarinterpretation as for those for IS2 and IL2 test. For PFS, IS6classification adds no additional information to IL2 testclassification. For OS, patients with IL2 Test classification Late havesimilar outcome to patients classified as IS6 Good, while patientsclassified as IS6 Other or Bad and IL2 test Early have inferiorsurvival.

These data are consistent with the IL2 test and IS2 and IS6 tests beingable to identify a common group of patients who perform poorly on bothIL2 therapy and anti-PD-1 therapy, while the IL2 test is able to isolatea group of patients (IL2 test Late) with good outcomes on IL2 and IS2and IS6 are able to identify patients with good outcomes on othersubsequent therapies (e.g. anti-PD-1). These two groups of goodperforming patients intersect, but are not the same. Apparently,patients who are classified as IL2 test Early who as a group have poorPFS on IL2 therapy can catch up with the superior performance of IL2test Late patients on subsequent therapy if they are also classified asIS6 Good to obtain similar OS. These data are consistent withobservations that patients treated with IL2 who do not achieve durableresponses can have good outcomes on subsequent therapies.

We also performed a classification of the 119 patient samples which wereused to develop the melanoma/nivolumab IS2 and IS6 classifiers (a cohortof samples referred to as “the Moffitt cohort”) with the IL2, IS2 andIS6 classifiers. Correspondence of the classifications in this cohort issummarized in table 31.

TABLE 31 IS2, IS6, and IL2 test classifications for the Moffitt cohortIL2 Test Early (N = 82) IL2 Test Late (N = 37) IS2 Early (N = 47) 47 0Late (N = 72) 35 37 IS6 Bad (N = 30) 29 1 Other (N = 55) 46 9 Good (N =34) 7 27All samples that are IL2 test Late are IS2 Late and all samples but one(97%) that are IS6 Bad are IL2 Early.

FIGS. 17A-17B shows the Kaplan-Meier plots of TTP and OS for the 119patients in the Moffitt Cohort for each set of test classificationsseparately. While the IL2 test has some ability to stratify patientstreated with nivolumab into groups with better and worsetime-to-progression (TTP) and OS, its performance for both endpoints isinferior to the IS2 and IS6 tests. Note that it is unlikely that manypatients in this study received IL2 therapy after the nivolumab studytherapy (as generally IL2 is given as a first line therapy and use ofIL2 in the era of anti-PD-1 therapy is decreasing).

Breaking down by combination of IS2 and IL2 test classifications givesthe Kaplan-Meier plots in FIGS. 18A-18B. Both in TTP and OS, outcome isdetermined by IS2 classification: no patients who are IL2 Test Late haveIS2 Early classification and patients who classify as IS2 Late havesimilar outcome regardless of their IL2 test classification.

Breaking down by combination of IS6 and IL2 test classifications givesthe Kaplan-Meier plots in FIGS. 19A-19B.

Although not quite as clear as for IS2, possibly due to the smallernumbers in some of the subgroups, TTP and OS are determined by IS6classification: patients who classify as IS6 Good have similar outcomeregardless of their IL2 test classification, as do those who classify asIS6 Other. All but one patient classified as IS6 Bad are classified asIL2 test Early and these patients have particularly poor TTP and OS.

Conclusion

These results indicate that the IL2 test is clearly distinct from theIS2 and IS6 tests. There is a group of patients who classify as IL2 testEarly and IS2 Early and/or IS6 Bad and these patients have poor outcomeson both therapies (anti-PD-1 and high dose IL2). However, while the IL2test identifies from the remaining patients a group who do well on IL2therapy, IS2 and IS6 identify from the remaining patients a group who dowell on anti-PD-1 therapy. These groups are not identical, althoughthere are a number of patients who are classified to both good outcomegroups.

Additionally, a patient whose sample is classified by the IL2 classifieras Late and the IS6 classifier as Other might be well advised to startwith IL2 therapy, but a relatively small percentage of patients fallinto this category. There currently exist very little clinical data atall on IL2 administration after checkpoint inhibitors, so getting anygood data on sequencing of therapies in both directions (at least IL2after checkpoint) is challenging at the present but should such databecome available the present classifiers may prove very useful sincethey can predict relative benefit of both IL2 and anti-PD-1 therapies.However, it may still be advantageous and useful to have both IS6results and IL2 test results available to help make an informed decisionabout melanoma treatment and general prognosis. For example, if thepatient sample is classified by the ILS classifier as Late, they maywish to start therapy with high dose IL2, due to chance of a completeremission and the duration of therapy and side effects, especially ifthe patient also tested as IS6 Good (and remained that way through thecourse of IL2 therapy), the patient could take nivolumab later with goodchances of a good outcome. This decision would be easier if the patientsample was tested as IS6 Other (or Bad—but this is a very unlikelycombination). Conversely, if the patient tested as Early under the IL2test, the patient may be guided to start with nivolumab.

The appended claims are included as further descriptions of thedisclosed inventions.

TABLE 32 Feature Definitions Left Center Right 3071.22 3085.19 3099.163099.64 3111.21 3122.77 3123.96 3137.95 3151.93 3192.34 3210.69 3229.043231.00 3243.53 3256.07 3296.71 3322.48 3348.26 3348.74 3363.68 3378.623380.51 3393.55 3406.59 3407.82 3419.63 3431.44 3434.51 3443.90 3453.303453.84 3465.66 3477.47 3477.78 3487.13 3496.49 3497.72 3508.15 3518.583530.87 3553.03 3575.20 3575.55 3589.52 3603.48 3603.84 3613.47 3623.103623.22 3636.17 3649.13 3667.09 3680.72 3694.34 3694.82 3704.05 3713.283713.38 3723.48 3733.58 3747.77 3756.31 3764.84 3766.08 3776.17 3786.253786.27 3795.51 3804.76 3805.44 3818.13 3830.83 3832.00 3841.64 3851.283877.78 3888.14 3898.49 3902.63 3908.45 3914.27 3915.70 3927.75 3939.803940.41 3952.46 3964.50 3995.34 4009.55 4023.77 4039.25 4051.40 4063.544080.14 4094.83 4109.53 4112.17 4119.37 4126.57 4127.25 4133.39 4139.524165.92 4170.82 4175.73 4198.95 4210.81 4222.68 4243.01 4250.23 4257.464257.59 4265.10 4272.62 4272.87 4287.80 4302.74 4329.62 4341.78 4353.934354.20 4361.64 4369.08 4369.23 4380.07 4390.91 4392.36 4408.98 4425.604426.15 4433.53 4440.90 4445.65 4462.52 4479.39 4500.29 4511.57 4522.854539.60 4546.35 4553.09 4553.30 4565.57 4577.84 4578.09 4590.49 4602.884618.22 4625.79 4633.36 4636.26 4645.87 4655.47 4667.54 4679.28 4691.024694.45 4714.19 4733.92 4745.60 4755.26 4764.92 4766.87 4775.61 4784.354784.77 4792.96 4801.15 4801.90 4818.04 4834.18 4836.11 4856.83 4877.544878.03 4892.00 4905.97 4926.11 4937.93 4949.75 4950.78 4964.54 4978.304978.58 4985.12 4991.65 4992.26 5002.74 5013.22 5014.15 5022.47 5030.795033.86 5044.52 5055.18 5055.82 5071.39 5086.96 5092.42 5106.90 5121.375121.84 5134.31 5146.78 5152.29 5158.45 5164.61 5164.71 5176.99 5189.285211.23 5223.66 5236.09 5240.06 5251.62 5263.18 5274.29 5295.48 5316.675351.59 5362.36 5373.12 5394.97 5416.74 5438.51 5442.72 5449.54 5456.365511.63 5522.35 5533.08 5537.82 5549.32 5560.82 5561.13 5570.45 5579.785618.09 5637.53 5656.98 5666.47 5674.79 5683.10 5684.19 5691.96 5699.725699.88 5705.55 5711.22 5714.46 5720.24 5726.02 5726.03 5734.42 5742.815744.18 5765.03 5785.88 5786.75 5794.52 5802.30 5803.89 5810.08 5816.275816.42 5822.76 5829.11 5832.02 5840.46 5848.89 5851.99 5865.50 5879.025879.59 5888.74 5897.90 5898.07 5909.77 5921.47 5923.79 5933.50 5943.215943.39 5952.76 5962.13 5973.59 5985.25 5996.91 5998.01 6008.58 6019.146019.52 6033.18 6046.84 6067.08 6081.13 6095.18 6095.91 6108.22 6120.526120.69 6127.36 6134.04 6134.57 6148.12 6161.67 6164.40 6174.39 6184.386186.65 6194.45 6202.25 6202.49 6209.86 6217.23 6217.50 6227.51 6237.526275.16 6284.15 6293.14 6293.98 6303.48 6312.97 6322.56 6330.62 6338.676338.74 6348.00 6357.25 6378.77 6393.09 6407.42 6407.61 6438.19 6468.776470.60 6487.32 6504.03 6521.11 6534.98 6548.85 6549.49 6562.06 6574.646575.85 6589.26 6602.67 6603.58 6652.90 6702.22 6715.26 6730.59 6745.936798.46 6809.11 6819.77 6825.83 6837.67 6849.52 6849.89 6859.44 6868.996869.17 6890.30 6911.42 6911.60 6920.97 6930.34 6931.26 6947.14 6963.036963.58 6971.11 6978.64 6979.01 6995.27 7011.52 7012.07 7021.07 7030.077030.26 7035.12 7039.99 7040.36 7053.49 7066.62 7066.99 7075.62 7084.267118.24 7143.95 7169.66 7178.66 7189.32 7199.97 7254.70 7269.03 7283.367283.91 7296.95 7309.99 7310.54 7341.31 7372.07 7375.19 7390.07 7404.957405.50 7418.63 7431.76 7433.23 7447.10 7460.97 7461.04 7470.12 7479.207479.34 7488.35 7497.36 7603.32 7617.37 7631.42 7760.24 7767.77 7775.307775.48 7783.32 7791.16 7803.94 7811.50 7819.05 7819.49 7828.64 7837.807871.88 7881.20 7890.53 7984.80 7994.91 8005.01 8006.66 8018.69 8030.728131.01 8153.05 8175.09 8192.54 8215.68 8238.82 8239.93 8254.89 8269.868307.15 8329.92 8352.70 8353.43 8364.46 8375.48 8378.60 8391.27 8403.958404.18 8412.46 8420.74 8420.89 8429.89 8438.90 8457.39 8464.68 8471.978472.01 8477.96 8483.90 8484.22 8490.81 8497.40 8499.41 8508.37 8517.338520.60 8531.96 8543.32 8555.29 8565.12 8574.94 8575.31 8592.03 8608.748650.07 8661.91 8673.76 8716.42 8724.60 8732.77 8733.22 8742.08 8750.938753.20 8766.14 8779.08 8800.49 8826.76 8853.03 8860.56 8871.76 8882.968883.70 8894.81 8905.92 8906.29 8932.10 8957.90 8959.80 8969.79 8979.788981.37 8993.97 9006.57 9007.02 9017.58 9028.14 9028.36 9038.69 9049.029056.50 9062.40 9068.30 9068.96 9078.57 9088.18 9089.96 9098.14 9106.319113.47 9142.49 9171.51 9196.31 9207.88 9219.45 9233.96 9244.34 9254.729254.90 9262.89 9270.88 9271.06 9285.48 9299.89 9308.35 9319.83 9331.319345.77 9366.21 9386.64 9387.10 9398.69 9410.29 9411.21 9454.03 9496.849554.47 9573.41 9592.35 9613.48 9626.56 9639.65 9640.11 9654.69 9669.279688.55 9723.57 9758.58 9759.54 9770.54 9781.54 9782.00 9792.43 9802.879843.02 9866.06 9889.11 9900.78 9914.98 9929.17 9930.23 9949.53 9968.839981.27 10000.17 10019.06 10067.77 10077.87 10087.97 10090.48 10098.9410107.40 10108.11 10116.48 10124.86 10126.38 10138.78 10151.18 10151.4110161.74 10172.07 10172.76 10184.81 10196.87 10200.08 10212.14 10224.1910224.65 10236.01 10247.38 10249.52 10262.23 10274.94 10276.56 10285.6710294.77 10295.62 10305.69 10315.75 10331.12 10339.06 10346.99 10351.9310357.22 10362.51 10438.84 10449.16 10459.47 10461.23 10490.76 10520.2910520.47 10533.16 10545.85 10546.03 10557.31 10568.60 10570.41 10590.0410609.67 10615.18 10638.37 10661.56 10662.02 10684.40 10706.79 10709.3110734.45 10759.59 10761.75 10777.84 10793.93 10794.02 10804.46 10814.8910827.64 10838.37 10849.11 10851.16 10857.90 10864.65 10909.38 10922.3410935.29 10951.44 10963.37 10975.30 11028.77 11056.40 11084.03 11090.8911107.43 11123.96 11132.45 11152.43 11172.40 11285.82 11305.10 11324.3911354.39 11366.13 11377.86 11377.88 11389.94 11401.99 11402.35 11414.4311426.50 11428.16 11442.74 11457.32 11464.41 11477.66 11490.90 11491.3611502.31 11513.27 11513.71 11530.99 11548.26 11613.00 11626.92 11640.8411643.86 11657.11 11670.36 11670.69 11686.46 11702.22 11719.74 11732.7211745.69 11746.38 11756.13 11765.89 11769.80 11786.10 11802.40 11822.1411835.46 11848.77 11867.09 11883.39 11899.68 11900.20 11913.40 11926.6111927.82 11938.26 11948.69 11949.11 11964.95 11980.79 11980.83 12004.3212027.80 12266.86 12290.16 12313.47 12436.99 12459.03 12481.07 12546.5012573.59 12600.69 12601.37 12615.26 12629.15 12661.30 12674.73 12688.1612688.34 12697.46 12706.58 12723.06 12738.33 12753.59 12769.89 12789.0612808.24 12830.74 12870.91 12911.09 12937.95 12962.98 12988.01 13049.5413076.86 13104.18 13119.56 13135.29 13151.02 13151.61 13178.66 13205.7113259.38 13273.73 13288.08 13304.84 13325.96 13347.09 13349.01 13365.9713382.93 13403.84 13417.46 13431.07 13472.37 13490.03 13507.68 13510.3113524.23 13538.16 13558.27 13573.12 13587.97 13599.32 13613.04 13626.7513627.85 13642.27 13656.70 13700.43 13720.06 13739.69 13739.92 13781.4713823.03 13826.35 13845.18 13864.02 13864.91 13894.51 13924.11 13925.4513942.27 13959.08 13960.63 13978.17 13995.70 14024.32 14050.51 14076.7014077.02 14099.66 14122.31 14124.55 14152.13 14179.70 14180.60 14204.5914228.58 14229.93 14254.82 14279.70 14280.60 14301.90 14323.20 14412.8814435.30 14457.73 14464.45 14489.34 14514.23 14516.47 14543.37 14570.2814571.18 14594.72 14618.26 14764.89 14786.87 14808.84 14859.96 14882.1514904.35 14951.88 14980.13 15008.38 15493.58 15509.06 15524.53 15525.6115557.28 15588.95 15611.89 15643.05 15674.21 15717.50 15751.16 15784.8316261.72 16302.97 16344.23 16447.74 16504.38 16561.02 16613.12 16657.7616702.40 17771.55 17809.44 17847.33 17971.11 17996.07 18021.03 18021.8218048.07 18074.32 18222.60 18264.84 18307.08 18315.44 18337.76 18360.0918360.73 18381.93 18403.14 18411.81 18440.72 18469.63 18472.84 18495.9718519.10 18542.37 18568.09 18593.81 18594.40 18638.57 18682.74 18703.2218742.28 18781.34 18811.43 18848.65 18885.87 19340.51 19374.24 19407.9719848.35 19918.05 19987.74 20754.38 20781.06 20807.73 20808.15 20831.2920854.43 20873.86 20939.16 21004.45 21006.87 21065.87 21124.88 21125.8521173.49 21221.13 21221.61 21272.15 21322.70 21323.18 21373.24 21423.3021636.47 21786.47 21936.48 21943.18 21978.28 22013.38 22991.54 23049.3823107.22 23111.41 23130.40 23149.38 23149.78 23177.79 23205.80 23205.9423266.87 23327.80 23404.15 23469.71 23535.26 24520.14 24550.89 24581.6527895.09 27945.61 27996.12

TABLE 33 Features Used for Classification Classifier 1 Classifier 2 30853111 3244 3590 3444 3613 3842 3818 4590 3888 5158 4051 5177 4434 55704793 5720 5003 6589 5071 6809 5417 6890 5675 6995 5692 8478 5765 92085840 11913 5953 11938 6348 11965 7297 13781 8491 14543 8565 14595 909815509 10162 10339 10590 10734 10804 10838 11056 11443 11502 11531 1162711686 11913 11938 12004 12290 12459 12738 12871 12963 13135 13179 1332613366 13573 13642 13845 14302 14543 14787 18265 21173 21373 23049 24551

TABLE 34 Classification by Sample Classification Label Sample ID(Classifier 1 + Classifier 2) 4 Late 5 Early 23 Late 24 Late 30 Late 32Late 33 Early 36 Late 37 Early 38 Early 39 Early 40 Early 41 Late 59Late 60 Early 61 Early 62 Late 63 Early 64 Early 65 Late 66 Early 67Early 68 Early 69 Early 70 Late 71 Late 72 Early 73 Early 74 Early 75Early 76 Early 77 Early 78 Early 79 Late 80 Early 81 Early 82 Early 83Early 84 Early 85 Early 86 Early 87 Early 88 Late 89 Early 90 Early 91Early 93 Early 94 Early 95 Late 96 Late 97 Early 98 Late 100 Early 101Early 102 Early 103 Late 105 Early 106 Early 107 Early 108 Late 109Early 111 Early 112 Late 113 Early 114 Early 116 Early 118 Early 119Early 120 Late 122 Early 123 Early 124 Early 125 Early 126 Late 127 Late128 Late 129 Late 130 Late 131 Early 132 Early 133 Early 134 Late 135Late 136 Early 137 Early 138 Early 140 Late 141 Early 142 Late 143 Late144 Early 145 Late 146 Late 147 Early 149 Early 150 Early 151 Late 152Early 153 Early 154 Early 156 Early 157 Late 158 Early 160 Late 161 Late162 Early 163 Early 164 Early 165 Late 166 Early 167 Late 168 Early 169Early 170 Early

TABLE 35 Proteins included in the extended leading edge set for acuteinflammation response (Amigo1). * indicates proteins to the right of theminimum of RS and † indicates proteins with anti- correlations of atleast as great magnitude as that at the maximum of RS. UniProtID ProteinName Correlation P value P02741 C-reactive protein 0.759 <0.001 P01024Complement C3 0.733 <0.001 P11226 Mannose-binding protein C 0.687 <0.001P01009 alpha1-Antitrypsin 0.585 0.005 P01024 Complement C3aanaphylatoxin 0.585 0.005 P01031 Complement C5 0.585 0.005 P07951Tropomyosin beta chain 0.528 0.011 Q8NEV9 Interleukin-27 0.523 0.012Q14213 P00738 Haptoglobin 0.518 0.013 P12956 ATP-dependent DNA helicase0.515 0.013 II 70 kDa subunit P33681 T-lymphocyte activation 0.456 0.028antigen CD80 P05156 Complement factor I 0.446 0.032 Q14624Inter-alpha-trypsin inhibitor 0.410 0.049 heavy chain H4 Q00535Cyclin-dependent kinase 5: 0.392 0.059 Q15078 activator p35 complexP06744 Glucose phosphate isomerase 0.385 0.065 P02743 Serum amyloid P0.385 0.065 P02679 Fibrinogen gamma chain dimer 0.379 0.068 P01023alpha2-Macroglobulin 0.359 0.085 P01024 Complement C3a 0.353 0.089anaphylatoxin des Arginine P10600 Transforming growth 0.349 0.094 factorbeta-3 P08107 Hsp70 0.338 0.104 P08697 alpha2-Antiplasmin −0.826*†<0.001 P00747 Angiostatin −0.518*† 0.013 P02649 Apolipoprotein E−0.421*† 0.043 Q9BZR6 Nogo Receptor/reticulon −0.410*† 0.049 4 receptorP02765 alpha2-HS-Glycoprotein −0.395*† 0.058 P08514 Integrin alpha-IIb:beta-3 −0.364*† 0.080 P05106 complex O00626 Macrophage-derived chemokine−0.359*† 0.085 Q9Y5K2 Kallikrein 4 −0.326* 0.118

TABLE 36 Proteins included in the extended leading edge set forcomplement system (Amigo9). † indicates proteins with anti-correlationsof at least as great magnitude as that at the maximum of RS. UniProtIDProtein Name Correlation P value P02748 Complement C9 0.779 <0.001P02741 C-reactive protein 0.759 <0.001 P01024 Complement C3 0.733 <0.001P11226 Mannose-binding protein C 0.687 0.001 P01031 Complement C5b, 6Complex 0.615 0.003 P13671 P01024 Complement C3a anaphylatoxin 0.5850.005 P01031 Complement C5 0.585 0.005 P12956 ATP-dependent DNA helicase0.515 0.013 II 70 kDa subunit POC0L4 Complement C4b 0.482 0.020 POC0L5P05156 Complement factor I 0.446 0.032 P13671 Complement C6 0.426 0.041P02743 Serum amyloid P 0.385 0.065 P05155 C1-Esterase Inhibitor 0.3740.072 P01023 alpha2-Macroglobulin 0.359 0.085 P09871 Complement C1s0.359 0.085 P01024 Complement C3a anaphylatoxin 0.354 0.089 des ArginineP00736 Complement C1r 0.297 0.154 P10643 Complement C7 0.292 0.161P01024 Complement C3b 0.287 0.169 P01031 Complement C5a 0.287 0.169Q15848 Adiponectin 0.287 0.169 P07357 Complement C8 0.272 0.193 P07358P07360 P00746 Complement factor D 0.262 0.210 P16109 P-Selectin −0.451†0.030 P16581 E-Selectin −0.292† 0.161 O75636 Ficolin-3 −0.262† 0.210

TABLE 37 Proteins included in the extended leading edge set for woundhealing (Amigo16). * indicates proteins to the left of the maximum of RSand † indicates proteins with anti-correlations of at least as greatmagnitude as that at the minimum of RS. UniProtID Protein NameCorrelation P value P08697 alpha2-Antiplasmin −0.826 <0.001 P03952Prekallikrein −0.666 0.001 P04196 Histidine-proline-rich glycoprotein−0.579 0.005 P07359 Platelet Glycoprotein lb alpha −0.533 0.010 P06396Gelsolin −0.528 0.011 P00747 Angiostatin −0.518 0.013 P07996Thrombospondin-1 −0.508 0.015 P00747 Plasminogen −0.503 0.016 P02649Apolipoprotein E (isoform E2) −0.431 0.038 P02649 Apolipoprotein E−0.421 0.043 P53582 Methionine aminopeptidase 1 −0.405 0.051 P02649Apolipoprotein E3 −0.400 0.055 P02649 Apolipoprotein E4 −0.400 0.055P37023 Activin receptor-like kinase 1 0.467*† 0.025 P02671 Fibrinogen0.456*† 0.028 P02675 P02679 P02679 Fibrinogen gamma chain dimer 0.379*0.068 P02671 D-dimer 0.364* 0.080 P02675 P02679

TABLE 38 Proteins included in the extended leading edge set for acutephase (UNIPROT1). * indicates proteins to the right of the minimum of RSand † indicates proteins with anti-correlations of at least as greatmagnitude as that at the maximum of RS. UniProtID Protein NameCorrelation P value P02741 C-reactive protein 0.759 <0.001 P11226Mannose-binding protein C 0.687 0.001 P01009 alpha1-Antitrypsin 0.5850.005 P00738 Haptoglobin 0.518 0.013 P0DJI8 Serum amyloid A 0.513 0.014P18428 Lipopolysaccharide-binding 0.482 0.020 protein Q14624Inter-alpha-trypsin 0.410 0.049 inhibitor heavy chain H4 P02743 Serumamyloid P 0.385 0.065 P02671 D-dimer 0.364 0.080 P02675 P02679 P01023alpha2-Macroglobulin 0.359 0.085 P08697 alpha2-Antiplasmin −0.826*†<0.001 P02765 alpha2-HS-Glycoprotein −0.395† 0.058

1. A method for detecting class labels in a melanoma patient, comprisingthe steps of: a) performing mass spectrometry on a blood-based sample ofthe patient and obtaining mass spectrometry data of the sample; b)performing a classification of the mass spectrometry data with the aidof a computer implementing a classifier, wherein the classifier isdeveloped from a development set of samples from melanoma patientstreated with the high dose IL2 therapy and consists of a hierarchicalcombination of classifiers 1 and 2, wherein classifier 1 is developedfrom the development set of samples and a set of mass spectral featuresidentified as being associated with an acute response biologicalfunction and generates either an Early class label and a Late classlabel, or the equivalent, and wherein classifier 2 is developed from asubset of samples in the development set classified as Late byclassifier 1 and also generates an Early class label and a Late classlabel or the equivalent, wherein class labels are detected for thepatient.
 2. The method of claim 1, wherein Classifier 1 and Classifier 2use the features for performing classification of the sample recited inAppendix B or a subset thereof.
 3. A method for detecting a class labelin a melanoma, comprising the steps of: a) performing mass spectrometryon a blood-based sample of the patient and obtaining mass spectrometrydata of the sample; b) performing a classification of the massspectrometry data with the aid of a computer implementing a classifier2, wherein the classifier 2 is developed from a subset of a developmentset of samples from melanoma patients treated with the high dose IL2therapy which have been classified as Late or the equivalent by aclassifier 1 using a set of mass spectral features identified as beingassociated with an acute response biological function; wherein classlabels are detected for the patient.
 4. The method of claim 3, whereinClassifier 2 uses the features for performing classification of thesample recited in Appendix B or a subset thereof.
 5. A classifierdevelopment method comprising the steps of: a) obtaining a developmentset of blood-based samples from a population of melanoma patients inadvance of treatment with high dose IL2 therapy; b) performing massspectrometry on the development set of samples; c) with the aid of acomputer developing a first classifier using a set of mass spectralfeatures which are associated with an acute response biologicalfunction, the first classifier generating a class label of either earlyor late or the equivalent; d) classifying the development set of sampleswith the first classifier; e) selecting a subset of the samples whichare classified as Late by the first classifier; f) with the aid of thecomputer, developing a second classifier using the subset selected instep e) as the development set of the second classifier, the secondclassifier generating a class label of early or late or the equivalent;and g) defining a final classification schema which combines the firstclassifier and the second classifier.
 6. The method of claim 5, whereinthe first and second classifiers use features for classification recitedin Appendix B or a subset thereof.
 7. The method of claim 1, whereinclassifier 1 and classifier 2 are developed using the procedure of FIG.7A-7B.
 8. The method of claim 5 wherein the first and second classifiersare developed using the procedure of FIGS. 7A-7B.
 9. (canceled) 10.(canceled)
 11. (canceled)
 12. A method of detecting class labels for amelanoma patient comprising performing the method of claim 1, as well asclassifying the sample of the patient with a classifier developed frommass spectral data of a set of blood based samples obtained frommelanoma patients treated with an anti-PD-1 drug to generate a classlabel, wherein class labels are detected.
 13. A method of detecting aclass label for a melanoma patient on high dose IL2 therapy byperforming a classification of the sample with a classifier developedfrom mass spectral data of a set of blood based samples obtained frommelanoma patients treated with an anti-PD-1 drug, the classifiergenerating a class label of Late or the equivalent and Early or theequivalent, wherein a class label is detected.
 14. A method of detectinga class label for a melanoma patient on high dose IL2 therapy byperforming a classification of the sample with a hierarchicalcombination of tumor size classifiers developed from mass spectral dataof a set of blood based samples obtained from melanoma patients treatedwith an anti-PD-1 drug, the hierarchical combination of tumor sizeclassifiers generating a class label of either Good or the Equivalent orBad or the equivalent, wherein a class label is detected.